Search Results: "anarcat"

4 October 2021

Paul Wise: FLOSS Activities September 2021

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration
  • Debian BTS: reopened bugs closed by a spammer
  • Debian wiki: unblock IP addresses, approve accounts

Communication
  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors The purple-discord/harmony/pyemd/librecaptcha/esprima-python work was sponsored by my employer. All other work was done on a volunteer basis.

5 September 2021

Antoine Beaupr : Automating major Debian upgrades

It's major upgrade time again! The Debian project just published the Debian 11 "bullseye" release, and it's pretty awesome! This makes me realized that I have never written here about my peculiar upgrade process, and figured it was worth bringing that up to a wider audience. My upgrade process also has a notable changes section which includes major version changes (e.g. Inkscape 1.0!), new packages (e.g. podman!) and important behavior changes (e.g. driverless scanning and printing!). I'm particularly interested to hear about any significant change I might have missed. If you know of a cool new package that shipped with bullseye and that I forgot, do let me know! But that's for the cool new stuff. We need to talk about the problems with Debian major upgrades.

Background I have been maintaining detailed upgrade guides, on my wiki, starting with the jessie release, but I have actually written such guides for Koumbit.org as far back as Debian squeeze in 2011 (another worker wrote the older Debian lenny upgrade guide in 2009). Koumbit, since then, has kept maintaining those guides all the way to the latest bullseye upgrade, through 7 major releases! Over the years, those guides evolved from a quick "cheat-sheet" format copied from the release notes into a more or less "scripted" form that I currently use. Each guide has a procedure made of a few steps that can be basically copy-pasted to batch-upgrade a host (or multiple hosts in parallel) as quickly as possible. There is also the predict-os script which allows you to keep track of progress of the upgrades in a Puppet cluster.

Limitations of the official procedure In comparison with my procedure, the official upgrade guide is mostly designed to upgrade a single machine, typically a workstation, with a rather slow and exhaustive process. The PDF version of the upgrade guide is 14 pages long! This, obviously, does not work when you have tens or hundreds of machines to upgrade. Debian upgrades are notorious for being extremely reliable, but we have a lot of packages, and there are always corner cases where the upgrade will just fail because of a bug specific to your environment. Those will only be fixed after some back and forth in the community (and that's assuming users report those bugs, which is not always the case). There's no obvious way to deploy "hot fixes" in this context, at least not without fixing the package and publishing it on an unofficial Debian archive while the official ones catch up. This is slow and difficult. Or some packages require manual labor. Examples of this are the PostgreSQL or Ganeti packages which require you to upgrade your clusters by hand, while the old and new packages live side by side. Debian packages bring you far in the upgrade process, but sometimes not all the way. Which means every Debian install needs to be manually upgraded and inspected when a new release comes out. That's slow and error prone and we can do better.

How to automate major upgrades I have a proposal to automate this. It's been mostly dormant in the Debian wiki, for 5 years now. Fundamentally, this is a hard problem: Debian gets installed in so many different environments, from workstations to physical servers to virtual machines, embedded systems and so on, that it's extremely hard to come up with a "one size fits all" system. The (manual) procedure I'm using is mostly targeting servers, but I'm also using it on workstations. And I'll note that it's specific to my home setup: I have a different procedure at work, although it has a lot of common code. To automate this, I would factor out that common code with hooks where you could easily inject special code like "you need to upgrade ferm first", "you need an extra reboot here", or "this is how you finish the PostgreSQL upgrade". With Debian getting closer to a 2 year release cycle, with the previous release being supported basically only one year after the new stable comes out, I feel more and more strongly that this needs better automation. So I'm thinking that I should write a prototype for this. Ubuntu has do-release-upgrade that is too Ubuntu-specific to be reused. An attempt at collaborating on this has been mostly met with silence from Ubuntu's side as well. I'm thinking that using something like Fabric, Mitogen, or Transilience: anything that will allow me to write simple, portable Python code that can run transparently on a local machine (for single systems upgrades, possibly with a GUI frontend) to remote servers (for large clusters of servers, maybe with canaries and grouping using Cumin). I'll note that Koumbit started experimenting with Puppet Bolt in the bullseye upgrade process, but that feels too site-specific to be useful more broadly.

Trade-offs I am not sure where this stands in the XKCD time trade-off evaluation, because the table doesn't actually cover the time frequency of Debian release (which is basically "biennial") and the amount of time the upgrade would take across a cluster (which varies a lot, but that I estimate to be between one to 6 hours per machine). Assuming I have 80 machines to upgrade, that is 80 to 480 hours (between ~3 to 20 days) of work! It's unclear how much work such an automated system would shave off, however. Assuming things are an order of magnitude faster (say I upgrade 10 machines at a time), I would shave off between 3 and 18 days of work, which implies I might allow myself to spend a minimum of 5 days working on such a project.

The other option: never upgrade Before people mention those: I am aware of containers, Kubernetes, and other deployment mechanisms. Indeed, those may be a long-term solution, we currently can't afford to migrate everything over to containers right now: that is a huge migration and a total paradigm shift. At that point, whatever is left might not even be Debian in the first place. And besides, if you run Kubernetes, you still need to run some OS underneath and upgrade that, so that problem never completely disappears. Still, maybe that's the final answer: never upgrade. For some stateless machines like DNS replicas or load balancers, that might make a lot of sense as there's no or little data to carry to the new host. But this implies a seamless and fast provisioning process, and we don't have that either: at my work, installing a machine takes about as long as upgrading it, and that's after a significant amount of work automating that process, partly writing my own Debian installer with Fabric (!).

What is your process? I'm curious to hear what people think of those ideas. It strikes me as really odd that no one has really tackled that problem yet, considering how many clusters of Debian machines are out there. Surely people are upgrading those, and not following that slow step by step guide, right? I suspect everyone is doing the same thing: we all have our little copy-paste script we batch onto multiple machines, sometimes in parallel. That is what the Debian.org sysadmins are doing as well. There must be a better way. What is yours?

My upgrades so far So far, I have upgraded 2 out of my 3 home machines running buster -- others have been installed directly in bullseye -- with only my main, old, messy server left. Upgrades have been pretty painless so far (see another report, for example), much better than the previous buster upgrade. Obviously, for me personal use, automating this is pointless. Work-side, however, is another story: we have over 80 boxes to upgrade there and that will take a while. The last stretch to buster cycle took about two years to complete, so we might be done by the time the next release (12, "bookworm") is released, but that's actually a full year after "buster" becomes EOL, so it's actually too late... At least I fixed the installers so that new the machines we create all ship with bullseye, so we stopped accumulating new buster hosts...
Thanks to lelutin and pabs for reviewing a draft of this post.

21 July 2021

Antoine Beaupr : Hacking my Kobo Clara HD

I just got a new Kobo ebook reader, a Kobo Clara HD. It's pretty similar to the Glo HD I had but which has unfortunately died after 5 years, even after trying to replace the battery.

Quick hardware review This is a neat little device. It's very similar to the Glo HD, which is a bit disappointing: you'd think they would have improved on the design in the 5+ years since the Glo HD has come out.. It does have an "amber" night light which is nice, but the bezel is still not level with the display, and the device is still kind of on the thick side. A USB-C (instead of micro-USB) port would have been nice too. But otherwise, it's pretty slick, and just works. And because the hardware design didn't change, I can still hack at it like a madman, which is really why I bought this thing in the first place. Hopefully it will last longer than 5 years. Ebook readers should really last for decades, not years, but I guess that's too much to expect from our consumerist, suicidal, extinctionist society.

Configuration hacks Here are the hacks I done on the device. I had done many more hacks on the Kobo Glo HD, but I decided to take a more streamlined, minimalist and, hopefully, easier for new users than the pile of hacks I was doing before (which I expand on at the end of the article).

SD card replacement I replaced the SD card. The original card shipped with the Clara HD was 8GB which meant all my books actually fitted on the original, but just barely. The new card is 16GB. Unfortunately, I did this procedure almost at the end of this guide (right before writing the syncthing scripts, below). Next time, that should be the first thing done so the original SD card acts as a pristine copy of the upstream firmware. So even though this seems like an invasive and difficult procedure, I actually do recommend you do it first. The process is basically to:
  1. crack open the Kobo case (don't worry, it sounds awful but I've done it often)
  2. take the SD card out
  3. copy it over to a new, larger card (say on your computer)
  4. put the larger card in
This guide has all the details.

Registration bypass hack This guide (from the same author!) has this awesome trick to bypass the annoying registration step. Basically:
  1. pretend you do not have wifi
  2. mount the device
  3. sqlite3 /media/.../KOBOeReader/.kobo/KoboReader.sqlite
  4. INSERT INTO user(UserID,UserKey) VALUES('1','');
  5. unmount the device
More details in the above guide, again.

Install koreader My e-reader of choise is Koreader. It's just that great. I still don't find the general user interface (ie. the "file browswer") as intuitive as the builtin one, but the book reading just feels better. And anyways it's the easier way to get a shell on the device. Follow those instructions, particularly the NickelMenu instructions (see also the NickelMenu home page). Yes, you need to install some other thing to start koreader, which doesn't start on its own. NickelMenu is the simplest and better integrated I have found. You might also want to install some dictionnaries and configure SSH:
  1. mount USB
  2. drop your SSH public key in .../KOBOeReader/.adds/koreader/settings/SSH/authorized_keys
  3. unmount USB
  4. enable SSH in koreader (Gear -> Network -> SSH -> start SSH)
Note that ed25519 keys do not work: try an RSA key. This might be because koreader ships with dropbear (or an older version), but I haven't verified this.

Install syncthing I use Syncthing to copy all my books into the device now. I was previously using Koreader's OPDS support with Calibre's web interface, but that was clunky and annoying, and I'd constantly have to copy books around. Now the entire collection is synchronized. As a bonus, I can actually synchronise (and backup!) the koreader metadata, since it's stored next to the files. So in theory, this means I could use koreader from multiple devices and have my reading progress sync'd, but I haven't tested that feature just yet. I chose Syncthing because it's simple, lightweight, supported on Linux and Android, and statically compiles by default which means it's easy to deploy on the Kobo. Here is how I installed and started Syncthing at first:
  1. Download the latest version for ARM
  2. extract the archive
  3. copy the syncthing binary into .../KOBOeReader/.adds/
  4. login over SSH (see above on how to enable) with -p 2222 -l root
  5. create the following directory: ~/.config/syncthing/
  6. create the following configuration file, named config.xml:
    <configuration version="18">
        <gui enabled="true" tls="false" debugging="false">
            <address>0.0.0.0:8384</address>
        </gui>
    </configuration>
    
  7. copy a valid ca-certificates.crt file (say from your Linux desktop) into /etc/ssl/certs/ on the Kobo (otherwise syncthing cannot bootstrap discovery servers)
  8. launch syncthing over SSH: /mnt/onboard/.adds/syncthing
You should now be able to connect to the syncthing GUI through your web browser. Immediately change the GUI admin user and password on the Settings: GUI tab. Then, figure out how to start it. Here are your options:
  1. on boot (inittab or whatever). downside: power usage.
  2. on wifi (udev hacks). downside: unreliable (see wallabako).
  3. on demand (e.g. nickel menu, koreader terminal shortcuts). downside: kind of clunky in koreader, did not work in nickel menu.
  4. manually, through shell. downside: requires a shell, but then again we already have one through koreader?
What I have done is to write trivial shell scripts (in .../KOBOeReader/scripts) to start syncthing. The first is syncthing-start.sh:
#!/bin/sh
/mnt/onboard/.adds/syncthing serve &
Then syncthing-stop.sh:
#!/bin/sh
/usr/bin/pkill syncthing
This makes those scripts usable from the koreader file browser. Then the folder can be added to the folder shortcuts and a long-hold on the script will allow you to execute it. Still have to figure out why the Nickel Menu script is not working, but it could simply reuse the above to simplify debugging. This is the script I ended up with, in .../KOBOeReader/.adds/nm/syncthing:
menu_item :main    :Syncthing (toggle)    :cmd_spawn         :exec /mnt/onboard/scripts/syncthing-stop.sh
  chain_success:skip:4
    chain_success                      :cmd_spawn          :exec /mnt/onboard/scripts/syncthing-start.sh
    chain_success                      :dbg_toast          :Started Syncthing server
    chain_failure                      :dbg_toast          :Error starting Syncthing server
    chain_always:skip:-1
  chain_success                        :dbg_toast          :Stopped Syncthing server
menu_item :main    :Syncthing (start)    :cmd_output         :exec /mnt/onboard/scripts/syncthing-start.sh
menu_item :main    :Syncthing (stop)    :cmd_output         :exec /mnt/onboard/scripts/syncthing-stop.sh
It's unclear why this doesn't work: I only get "Error starting Syncthing server" for the toggle, and no output for the (start) action. In either case, syncthing doesn't actually start.

Avoided tasks This list wouldn't be complete without listing more explicitly the stuff I have done before on the Kobo Glo HD and which I have deliberately decided not to do here because my time is precious:
  • plato install: beautiful project, but koreader is good enough
  • wallabako setup: too much work to maintain, Wallabag articles are too distracting and available on my phone anyways
  • using calibre to transfer books: not working half the time, different file layout than the source, one less Calibre dependency
  • using calibre to generate e-books based on RSS feeds (yes, I did that, and yes, it was pretty bad and almost useless)
  • SSH support: builtin to koreader
Now maybe I'll have time to actually read a book...

16 July 2021

Jamie McClelland: From Ikiwiki to Hugo

Back in the days of Etch, I converted this blog from Drupal to ikiwiki. I remember being very excited about this brand new concept of static web sites derived from content stored in a version control system. And now over a decade later I ve moved to hugo. I feel some loyalty to ikiwiki and Joey Hess for opening my eyes to the static web site concept. But ultimately I grew tired of splitting my time and energy between learning ikiwiki and hugo, which has been my tool of choice for new projects. When I started getting strange emails that I suspect had something to do with spammers filling out ikiwiki s commenting registration system, I choose to invest my time in switching to hugo over debugging and really understanding how ikiwiki handles user registration. I carefully reviewed anarcat s blog on converting from ikiwiki to hugo and learned about a lot of ikiwiki features I am not using. Wow, it s times like these that I m glad I keep it really simple. Based on the various ikiwiki2hugo python scripts I studied, I eventually wrote a far simpler one tailored to my needs. Also, in what could only be called a desperate act of procrastination combined with a touch of self-hatred (it s been a rough week) I rejected all the commenting options available to me and choose to implement my own in PHP. What?!?! Why would anyone do such a thing? I refer you to my previous sentence about desperate procrastination. And also I know it s fashionable to hate PHP, but honestly as the first programming language I learned, there is something comforting and familiar about it. And, on a more objective level, I can deploy it easily to just about any hosting provider in the world. I don t have to maintain a unicorn service or a nodejs service and make special configuration entries in my web configuration. All I have to do is upload the php files and I m done. Well, I m sure I ll regret this decision. Special thanks to Alexander Bilz for the anatole hugo theme. I choose it via a nearly random click to avoid the rabbit hole of choosing a theme. And, by luck, it has turned out quite well. I only had to override the commento partial theme page to hijack it for my own commenting system s use.

29 June 2021

Antoine Beaupr : Another syncmaildir crash

So I had another major email crash with my syncmaildir setup. This time I was at least able to confirm the issue, and I still haven't lost mail thanks to backups and sheer luck (again).

The crash It is not really worth going over the crash in details, it's fairly similar to the last one: something bad happened and smd started destroying everything. The hint is that it takes a long time to do what usually takes seconds. It helps that I now have a second monitor showing logs. I still lost much more mail than the last time. I used to have "301 723 messages", according to notmuch. But then when I ran smd-pull by hand, it was telling me:
95K emails scanned
Oops. You can see notmuch happily noticing the destroyed files on the server:
jun 28 16:33:40 marcos notmuch[28532]: No new mail. Removed 65498 messages. Detected 1699 file renames.
jun 28 16:36:05 marcos notmuch[29746]: No new mail. Removed 68883 messages. Detected 2488 file renames.
jun 28 16:41:40 marcos notmuch[31972]: No new mail. Removed 118295 messages. Detected 3657 file renames.
The final count ended up being 81 042 messages, according to notmuch. A whopping 220 000 mails deleted. The interesting bit, this time around, is that I caught smd in the act of running two processes in parallel:
jun 28 16:30:09 curie systemd[2845]: Finished pull emails with syncmaildir. 
jun 28 16:30:09 curie systemd[2845]: Starting push emails with syncmaildir... 
jun 28 16:30:09 curie systemd[2845]: Starting pull emails with syncmaildir... 
So clearly that is the source of the bug.

Recovery Emergency stop on curie:
notmuch dump > notmuch.dump
systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer
On marcos (the server), guessed the number of messages delivered since the last backup to be 71, just looking at timestamps in the mail log. Made a list:
grep postfix/local /var/log/mail.log   tail -71 > lost-mail
Found postfix queue IDs:
sed 's/.*\]://;s/:.*//' lost-mail > qids
Turn those into message IDs, find those that are missing from the disk (had previously ran notmuch new just to be sure it's up to date):
while read qid ; do 
    grep "$qid: message-id" /var/log/mail.log
done < qids    sed 's/.*message-id=<//;s/>//'   while read msgid; do
    sudo -u anarcat notmuch count --exclude=false id:$msgid   grep -q 0 && echo $msgid
done
Copy this back on curie as missing-msgids and:
$ wc -l missing-msgids 
48 missing-msgids
$ while read msgid ; do notmuch count --exclude=false id:$msgid   grep -q 0 && echo $msgid ; done < missing-msgids
mailman.189.1624881611.23397.nodes-reseaulibre.ca@reseaulibre.ca
AnwMy7rdSpK-N-vt4AiOag@ismtpd0148p1mdw1.sendgrid.net
only two mails missing! whoohoo! Copy those back onto marcos as really-missing-msgids, and look at the full mail logs to see what they are:
~anarcat/src/koumbit-scripts/mail/postfix-trace --from-file really-missing-msgids2
I actually remembered deleting those, so no mail lost! Rebuild the list of msgids that were lost, on marcos:
while read qid ; do grep "$qid: message-id" /var/log/mail.log; done < qids    sed 's/.*message-id=<//;s/>//'
Copy that on curie as lost-mail-msgids, then copy the files over in a test dir:
while read msgid ; do
    notmuch search --output=files --exclude=false "id:$msgid"
done < lost-mail-msgids   sed 's#/home/anarcat/Maildir/##'   rsync -v  --files-from=- /home/anarcat/Maildir/ shell.anarc.at:restore/Maildir-angela/
If that looks about right, on marcos:
find restore/Maildir-angela/ -type f   wc -l
... should match the number of missing mails, roughly. Copy if in the real spool:
while read msgid ; do
    notmuch search --output=files --exclude=false "id:$msgid"
done < lost-mail-msgids   sed 's#/home/anarcat/Maildir/##'   rsync -v  --files-from=- /home/anarcat/Maildir/ shell.anarc.at:Maildir/
Then on the server, notmuch new should find the new emails, and we shouldn't have any lost mail anymore:
while read qid ; do grep "$qid: message-id" /var/log/mail.log; done < qids    sed 's/.*message-id=<//;s/>//'   while read msgid; do sudo -u anarcat notmuch count --exclude=false id:$msgid   grep -q 0 && echo $msgid ; done
Then, crucial moment, try to pull the new mails from the backups on curie:
anarcat@curie:~(main)$ smd-pull  -n  --show-tags -v
Found lockfile of a dead instance. Ignored.
Phase 0: handshake
Phase 1: changes detection
    5K emails scanned
   10K emails scanned
   15K emails scanned
   20K emails scanned
   25K emails scanned
   30K emails scanned
   35K emails scanned
   40K emails scanned
   45K emails scanned
   50K emails scanned
Phase 2: synchronization
Phase 3: agreement
default: smd-client@localhost: TAGS: stats::new-mails(49687), del-mails(0), bytes-received(215752279), xdelta-received(3703852)
"smd-pull  -n  --show-tags -v" took 3 mins 39 secs
This brought me back to the state after the backup plus the mails delivered during the day, which means I had to catchup with all my holiday's read emails (1440 mails!) but thankfully I made a dump of the notmuch database on curie at the start of the procedure, so this actually restored a sane state:
pv notmuch.dump   notmuch restore
Phew!

Workaround I have filed this as a bug in upstream issue 18. Considering I filed 11 issues and only 3 of those were closed, I'm not holding my breath. I nevertheless filed PR 19 in the hope that this will fix my particular issue, but I'm not even sure this is the right fix...

Fix At this point, I'm really ready to give up on SMD. It's really, really nice to be able to sync mail over SSH because I don't need to store my IMAP password on disk. But surely there are more reliable syncing mechanisms. I do not remember ever losing that much mail before. At worst, offlineimap would duplicate emails like mad, but never destroy my entire mail spool that way. As mentioned before, there are other programs that sync mail. I'm looking at:
  • offlineimap3: requires IMAP, used the py2 version in the past, might just still work, first sync painful (IIRC), ways to tunnel over SSH, see comment below
  • isync/mbsync: might be faster, I remember having trouble switching from offlineimap to this, has support for TLS client certs, running over SSH, and generally has good words from multiple Debian and notmuch people
  • getmail: just downloads email, might not be enough
  • nncp: treat the local spool as another mail server, might not be compatible with my "multiple clients" setup
  • doveadm-sync: requires dovecot on both ends, but supports using SSH to sync, will try this next, may have performance problems, see comment below
  • interimap: syncs two IMAP servers, apparently faster than doveadm and offlineimap
  • mail-sync: notify support, happens over any piped transport (e.g. ssh), diff/patch system, requires binary on both ends, mentions UUCP in the manpage, seems awfully complicated to setup, mentions rsmtp which is a nice name for rsendmail

27 May 2021

Michael Prokop: What to expect from Debian/bullseye #newinbullseye

Bullseye Banner, Copyright 2020 Juliette Taka Debian v11 with codename bullseye is supposed to be released as new stable release soon-ish (let s hope for June, 2021! :)). Similar to what we had with #newinbuster and previous releases, now it s time for #newinbullseye! I was the driving force at several of my customers to be well prepared for bullseye before its freeze, and since then we re on good track there overall. In my opinion, Debian s release team did (and still does) a great job I m very happy about how unblock requests (not only mine but also ones I kept an eye on) were handled so far. As usual with major upgrades, there are some things to be aware of, and hereby I m starting my public notes on bullseye that might be worth also for other folks. My focus is primarily on server systems and looking at things from a sysadmin perspective. Further readings Of course start with taking a look at the official Debian release notes, make sure to especially go through What s new in Debian 11 + Issues to be aware of for bullseye. Chris published notes on upgrading to Debian bullseye, and also anarcat published upgrade notes for bullseye. Package versions As a starting point, let s look at some selected packages and their versions in buster vs. bullseye as of 2021-05-27 (mainly having amd64 in mind):
Package buster/v10 bullseye/v11
ansible 2.7.7 2.10.8
apache 2.4.38 2.4.46
apt 1.8.2.2 2.2.3
bash 5.0 5.1
ceph 12.2.11 14.2.20
docker 18.09.1 20.10.5
dovecot 2.3.4 2.3.13
dpkg 1.19.7 1.20.9
emacs 26.1 27.1
gcc 8.3.0 10.2.1
git 2.20.1 2.30.2
golang 1.11 1.15
libc 2.28 2.31
linux kernel 4.19 5.10
llvm 7.0 11.0
lxc 3.0.3 4.0.6
mariadb 10.3.27 10.5.10
nginx 1.14.2 1.18.0
nodejs 10.24.0 12.21.0
openjdk 11.0.9.1 11.0.11+9 + 17~19
openssh 7.9p1 8.4p1
openssl 1.1.1d 1.1.1k
perl 5.28.1 5.32.1
php 7.3 7.4+76
postfix 3.4.14 3.5.6
postgres 11 13
puppet 5.5.10 5.5.22
python2 2.7.16 2.7.18
python3 3.7.3 3.9.2
qemu/kvm 3.1 5.2
ruby 2.5.1 2.7+2
rust 1.41.1 1.48.0
samba 4.9.5 4.13.5
systemd 241 247.3
unattended-upgrades 1.11.2 2.8
util-linux 2.33.1 2.36.1
vagrant 2.2.3 2.2.14
vim 8.1.0875 8.2.2434
zsh 5.7.1 5.8
Linux Kernel The bullseye release will ship a Linux kernel based on v5.10 (v5.10.28 as of 2021-05-27, with v5.10.38 pending in unstable/sid), whereas buster shipped kernel 4.19. As usual there are plenty of changes in the kernel area and this might warrant a separate blog entry, but to highlight some issues: One surprising change might be that the scrollback buffer (Shift + PageUp) is gone from the Linux console. Make sure to always use screen/tmux or handle output through a pager of your choice if you need all of it and you re in the console. The kernel provides BTF support (via CONFIG_DEBUG_INFO_BTF, see #973870), which means it s no longer necessary to install LLVM, Clang, etc (requiring >100MB of disk space), see Gregg s excellent blog post regarding the underlying rational. Sadly the libbpf-tools packaging didn t make it into bullseye (#978727), but if you want to use your own self-made Debian packages, my notes might be useful. With kernel version 5.4, SUBDIRS support was removed from kbuild, so if an out-of-tree kernel module (like a *-dkms package) fails to compile on bullseye, make sure to use a recent version of it which uses M= or KBUILD_EXTMOD= instead. Unprivileged user namespaces are enabled by default (see #898446 + #987777), so programs can create more restricted sandboxes without the need to run as root or via a setuid-root helper. If you prefer to keep this feature restricted (or tools like web browsers, WebKitGTK, Flatpak, don t work), use sysctl -w kernel.unprivileged_userns_clone=0 . The /boot/System.map file(s) no longer provide the actual data, you need to switch to the dbg package if you rely on that information:
% cat /boot/System.map-5.10.0-6-amd64 
ffffffffffffffff B The real System.map is in the linux-image-<version>-dbg package
Be aware though, that the *-dbg package requires ~5GB of additional disk space. Systemd systemd v247 made it into bullseye (updated from v241). Same as for the kernel this might warrant a separate blog entry, but to mention some highlights: Systemd in bullseye activates its persistent journal functionality by default (storing its files in /var/log/journal/, see #717388). systemd-timesyncd is no longer part of the systemd binary package itself, but available as standalone package. This allows usage of ntp, chrony, openntpd, without having systemd-timesyncd installed (which prevents race conditions like #889290, which was biting me more than once). journalctl gained new options:
--cursor-file=FILE      Show entries after cursor in FILE and update FILE
--facility=FACILITY...  Show entries with the specified facilities
--image=IMAGE           Operate on files in filesystem image
--namespace=NAMESPACE   Show journal data from specified namespace
--relinquish-var        Stop logging to disk, log to temporary file system
--smart-relinquish-var  Similar, but NOP if log directory is on root mount
systemctl gained new options:
clean UNIT...                       Clean runtime, cache, state, logs or configuration of unit
freeze PATTERN...                   Freeze execution of unit processes
thaw PATTERN...                     Resume execution of a frozen unit
log-level [LEVEL]                   Get/set logging threshold for manager
log-target [TARGET]                 Get/set logging target for manager
service-watchdogs [BOOL]            Get/set service watchdog state
--with-dependencies                 Show unit dependencies with 'status', 'cat', 'list-units', and 'list-unit-files'
 -T --show-transaction              When enqueuing a unit job, show full transaction
 --what=RESOURCES                   Which types of resources to remove
--boot-loader-menu=TIME             Boot into boot loader menu on next boot
--boot-loader-entry=NAME            Boot into a specific boot loader entry on next boot
--timestamp=FORMAT                  Change format of printed timestamps
If you use systemctl edit to adjust overrides, then you ll now also get the existing configuration file listed as comment, which I consider very helpful. The MACAddressPolicy behavior with systemd naming schema v241 changed for virtual devices (I plan to write about this in a separate blog post). There are plenty of new manual pages: systemd also gained new unit configurations related to security hardening: Another new unit configuration is SystemCallLog= , which supports listing the system calls to be logged. This is very useful for for auditing or temporarily when constructing system call filters. The cgroupv2 change is also documented in the release notes, but to explicitly mention it also here, quoting from /usr/share/doc/systemd/NEWS.Debian.gz:
systemd now defaults to the unified cgroup hierarchy (i.e. cgroupv2).
This change reflects the fact that cgroups2 support has matured
substantially in both systemd and in the kernel.
All major container tools nowadays should support cgroupv2.
If you run into problems with cgroupv2, you can switch back to the previous,
hybrid setup by adding systemd.unified_cgroup_hierarchy=false to the
kernel command line.
You can read more about the benefits of cgroupv2 at
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
Note that cgroup-tools (lssubsys + lscgroup etc) don t work in cgroup2/unified hierarchy yet (see #959022 for the details). Configuration management puppet s upstream doesn t provide packages for bullseye yet (see PA-3624 + MODULES-11060), and sadly neither v6 nor v7 made it into bullseye, so when using the packages from Debian you re still stuck with v5.5 (also see #950182). ansible is also available, and while it looked like that only version 2.9.16 would make it into bullseye (see #984557 + #986213), actually version 2.10.8 made it into bullseye. chef was removed from Debian and is not available with bullseye (due to trademark issues). Prometheus stack Prometheus server was updated from v2.7.1 to v2.24.1, and the prometheus service by default applies some systemd hardening now. Also all the usual exporters are still there, but bullseye also gained some new ones: Virtualization docker (v20.10.5), ganeti (v3.0.1), libvirt (v7.0.0), lxc (v4.0.6), openstack, qemu/kvm (v5.2), xen (v4.14.1), are all still around, though what s new and noteworthy is that podman version 3.0.1 (tool for managing OCI containers and pods) made it into bullseye. If you re using the docker packages from upstream, be aware that they still don t seem to understand Debian package version handling. The docker* packages will not be automatically considered for upgrade, as 5:20.10.6~3-0~debian-buster is considered newer than 5:20.10.6~3-0~debian-bullseye:
% apt-cache policy docker-ce
  docker-ce:
    Installed: 5:20.10.6~3-0~debian-buster
    Candidate: 5:20.10.6~3-0~debian-buster
    Version table:
   *** 5:20.10.6~3-0~debian-buster 100
          100 /var/lib/dpkg/status
       5:20.10.6~3-0~debian-bullseye 500
          500 https://download.docker.com/linux/debian bullseye/stable amd64 Packages
Vagrant is available in version 2.2.14, the package from upstream works perfectly fine on bullseye as well. If you re relying on VirtualBox, be aware that upstream doesn t provide packages for bullseye yet, but the package from Debian/unstable (v6.1.22 as of 2021-05-27) works fine on bullseye (VirtualBox isn t shipped with stable releases since quite some time due to lack of cooperation from upstream on security support for older releases, see #794466). If you rely on the virtualbox-guest-additions-iso and its shared folders support, you might be glad to hear that v6.1.22 made it into bullseye (see #988783), properly supporting more recent kernel versions like present in bullseye. debuginfod There s a new service debuginfod.debian.net (see debian-devel-announce and Debian Wiki), which makes the debugging experience way smoother. You no longer need to download the debugging Debian packages (*-dbgsym/*-dbg), but instead can fetch them on demand, by exporting the following variables (before invoking gdb or alike):
% export DEBUGINFOD_PROGRESS=1    # for optional download progress reporting
% export DEBUGINFOD_URLS="https://debuginfod.debian.net"
BTW: if you can t rely on debuginfod (for whatever reason), I d like to point your attention towards find-dbgsym-packages from the debian-goodies package. Vim Sadly Vim 8.2 once again makes another change for bad defaults (hello mouse behavior!). When incsearch is set, it also applies to :substitute. This makes it veeeeeeeeeery annoying when running something like :%s/\s\+$// to get rid of trailing whitespace characters, because if there are no matches it jumps to the beginning of the file and then back, sigh. To get the old behavior back, you can use this:
au CmdLineEnter : let s:incs = &incsearch   set noincsearch
au CmdLineLeave : let &incsearch = s:incs
rsync rsync was updated from v3.1.3 to v3.2.3. It provides various checksum enhancements (see option --checksum-choice). We got new capabilities (hardlink-specials, atimes, optional protect-args, stop-at, no crtimes) and the addition of zstd and lz4 compression algorithms. And we got new options: OpenSSH OpenSSH was updated from v7.9p1 to 8.4p1, so if you re interested in all the changes, check out the release notes between those version (8.0, 8.1, 8.2, 8.3 + 8.4). Let s highlight some notable new features: Misc unsorted

24 May 2021

Antoine Beaupr : Leaving Freenode

The freenode IRC network has been hijacked. TL;DR: move to libera.chat or OFTC.net, as did countless free software projects including Gentoo, CentOS, KDE, Wikipedia, FOSDEM, and more. Debian and the Tor project were already on OFTC and are not affected by this.

What is freenode and why should I care? freenode is the largest remaining IRC network. Before this incident, it had close to 80,000 users, which is small in terms of modern internet history -- even small social networks are larger by multiple orders of magnitude -- but is large in IRC history. The IRC network is also extensively used by the free software community, being the default IRC network on many programs, and used by hundreds if not thousands of free software projects. I have been using freenode since at least 2006. This matters if you care about IRC, the internet, open protocols, decentralisation, and, to a certain extent, federation as well. It also touches on who has the right on network resources: the people who "own" it (through money) or the people who make it work (through their labor). I am biased towards open protocols, the internet, federation, and worker power, and this might taint this analysis.

What happened? It's a long story, but basically:
  1. back in 2017, the former head of staff sold the freenode.net domain (and its related company) to Andrew Lee, "American entrepreneur, software developer and writer", and, rather weirdly, supposedly "crown prince of Korea" although that part is kind of complex (see House of Yi, Yi Won, and Yi Seok). It should be noted the Korean Empire hasn't existed for over a century at this point (even though its flag, also weirdly, remains)
  2. back then, this was only known to the public as this strange PIA and freenode joining forces gimmick. it was suspicious at first, but since the network kept running, no one paid much attention to it. opers of the network were similarly reassured that Lee would have no say in the management of the network
  3. this all changed recently when Lee asserted ownership of the freenode.net domain and started meddling in the operations of the network, according to this summary. this part is disputed, but it is corroborated by almost a dozen former staff which collectively resigned from the network in protest, after legal threats, when it was obvious freenode was lost.
  4. the departing freenode staff founded a new network, irc.libera.chat, based on the new ircd they were working on with OFTC, solanum
  5. meanwhile, bot armies started attacking all IRC networks: both libera and freenode, but also OFTC and unrelated networks like a small one I help operate. those attacks have mostly stopped as of this writing (2021-05-24 17:30UTC)
  6. on freenode, however, things are going for the worse: Lee has been accused of taking over a channel, in a grotesque abuse of power; then changing freenode policy to not only justify the abuse, but also remove rules against hateful speech, effectively allowing nazis on the network (update: the change was reverted, but not by Lee)
Update: even though the policy change was reverted, the actual conversations allowed on freenode have already degenerated into toxic garbage. There are also massive channel takeovers (presumably over 700), mostly on channels that were redirecting to libera, but also channels that were still live. Channels that were taken over include #fosdem, #wikipedia, #haskell... Instead of working on the network, the new "so-called freenode" staff is spending effort writing bots and patches to basically automate taking over channels. I run an IRC network and this bot is obviously not standard "services" stuff... This is just grotesque. At this point I agree with this HN comment:
We should stop implicitly legitimizing Andrew Lee's power grab by referring to his dominion as "Freenode". Freenode is a quarter-century-old community that has changed its name to libera.chat; the thing being referred to here as "Freenode" is something else that has illegitimately acquired control of Freenode's old servers and user database, causing enormous inconvenience to the real Freenode.
I don't agree with the suggested name there, let's instead call it "so called freenode" as suggested later in the thread.

What now? I recommend people and organisations move away from freenode as soon as possible. This is a major change: documentation needs to be fixed, and the migration needs to be coordinated. But I do not believe we can trust the new freenode "owners" to operate the network reliably and in good faith. It's also important to use the current momentum to build a critical mass elsewhere so that people don't end up on freenode again by default and find an even more toxic community than your typical run-of-the-mill free software project (which is already not a high bar to meet). Update: people are moving to libera in droves. It's now reaching 18,000 users, which is bigger than OFTC and getting close to the largest, traditionnal, IRC networks (EFnet, Undernet, IRCnet are in the 10-20k users range). so-called freenode is still larger, currently clocking 68,000 users, but that's a huge drop from the previous count which was 78,000 before the exodus began. We're even starting to see the effects of the migration on netsplit.de. Update 2: the isfreenodedeadyet.com site is updated more frequently than netsplit and shows tons more information. It shows 25k online users for libera and 61k for so-called freenode (down from ~78k), and the trend doesn't seem to be stopping for so-called freenode. There's also a list of 400+ channels that have moved out. Keep in mind that such migrations take effect over long periods of time.

Where do I move to? The first thing you should do is to figure out which tool to use for interactive user support. There are multiple alternatives, of course -- this is the internet after all -- but here is a short list of suggestions, in preferred priority order:
  1. irc.libera.chat
  2. irc.OFTC.net
  3. Matrix.org, which bridges with OFTC and (hopefully soon) with libera as well, modern IRC alternative
  4. XMPP/Jabber also still exists, if you're into that kind of stuff, but I don't think the "chat room" story is great there, at least not as good as Matrix
Basically, the decision tree is this:
  • if you want to stay on IRC:
    • if you are already on many OFTC channels and few freenode channels: move to OFTC
    • if you are more inclined to support the previous freenode staff: move to libera
    • if you care about matrix users (in the short term): move to OFTC
  • if you are ready to leave IRC:
    • if you want the latest and greatest: move to Matrix
    • if you like XML and already use XMPP: move to XMPP
Frankly, at this point, everyone should seriously consider moving to Matrix. The user story is great, the web is a first class user, it supports E2EE (although XMPP as well), and has a lot of momentum behind it. It even bridges with IRC well (which is not the case for XMPP) so if you're worried about problems like this happening again. (Indeed, I wouldn't be surprised if similar drama happens on OFTC or libera in the future. The history of IRC is full of such epic controversies, takeovers, sabotage, attacks, technical flamewars, and other silly things. I am not sure, but I suspect a federated model like Matrix might be more resilient to conflicts like this one.) Changing protocols might mean losing a bunch of users however: not everyone is ready to move to Matrix, for example. Graybeards like me have been using irssi for years, if not decades, and would take quite a bit of convincing to move elsewhere. I have mostly kept my channels on IRC, and moved either to OFTC or libera. In retrospect, I think I might have moved everything to OFTC if I would have thought about it more, because almost all of my channels are there. But I kind of expect a lot of the freenode community to move to libera, so I am keeping a socket open there anyways.

How do I move? The first thing you should do is to update documentation, websites, and source code to stop pointing at freenode altogether. This is what I did for feed2exec, for example. You need to let people know in the current channel as well, and possibly shutdown the channel on freenode. Since my channels are either small or empty, I took the radical approach of:
  • redirecting the channel to ##unavailable which is historically the way we show channels have moved to another network
  • make the channel invite-only (which effectively enforces the redirection)
  • kicking everyone out of the channel
  • kickban people who rejoin
  • set the topic to announce the change
In IRC speak, the following commands should do all this:
/msg ChanServ set #anarcat mlock +if ##unavailable
/msg ChanServ clear #anarcat users moving to irc.libera.chat
/msg ChanServ set #anarcat restricted on
/topic #anarcat this channel has moved to irc.libera.chat
If the channel is not registered, the following might work
/mode #anarcat +if ##unavailable
Then you can leave freenode altogether:
/disconnect Freenode unacceptable hijack, policy changes and takeovers. so long and thanks for all the fish.
Keep in mind that some people have been unable to setup such redirections, because the new freenode staff have taken over their channel, in which case you're out of luck... Some people have expressed concern about their private data hosted at freenode as well. If you care about this, you can always talk to NickServ and DROP your nick. Be warned, however, that this assumes good faith of the network operators, which, at this point, is kind of futile. I would assume any data you have registered on there (typically: your NickServ password and email address) to be compromised and leaked. If your password is used elsewhere (tsk, tsk), change it everywhere. Update: there's also another procedure, similar to the above, but with a different approach. Keep in mind that so-called freenode staff are actively hijacking channels for the mere act of mentioning libera in the channel topic, so thread carefully there.

Last words This is a sad time for IRC in general, and freenode in particular. It's a real shame that the previous freenode staff have been kicked out, and it's especially horrible that if the new policies of the network are basically making the network open to nazis. I wish things would have gone out differently: now we have yet another fork in the IRC history. While it's not the first time freenode changes name (it was called OPN before), now the old freenode is still around and this will bring much confusion to the world, especially since the new freenode staff is still claiming to support FOSS. I understand there are many sides to this story, and some people were deeply hurt by all this. But for me, it's completely unacceptable to keep pushing your staff so hard that they basically all (except one?) resign in protest. For me, that's leadership failure at the utmost, and a complete disgrace. And of course, I can't in good conscience support or join a network that allows hate speech. Regardless of the fate of whatever we'll call what's left of freenode, maybe it's time for this old IRC thing to die already. It's still a sad day in internet history, but then again, maybe IRC will never die...

25 April 2021

Antoine Beaupr : Lost article ideas

I wrote for LWN for about two years. During that time, I wrote (what seems to me an impressive) 34 articles, but I always had a pile of ideas in the back of my mind. Those are ideas, notes, and scribbles lying around. Some were just completely abandoned because they didn't seem a good fit for LWN. Concretely, I stored those in branches in a git repository, and used the branch name (and, naively, the last commit log) as indicators of the topic. This was the state of affairs when I left:
remotes/private/attic/novena                    822ca2bb add letter i sent to novena, never published
remotes/private/attic/secureboot                de09d82b quick review, add note and graph
remotes/private/attic/wireguard                 5c5340d1 wireguard review, tutorial and comparison with alternatives
remotes/private/backlog/dat                     914c5edf Merge branch 'master' into backlog/dat
remotes/private/backlog/packet                  9b2c6d1a ham radio packet innovations and primer
remotes/private/backlog/performance-tweaks      dcf02676 config notes for http2
remotes/private/backlog/serverless              9fce6484 postponed until kubecon europe
remotes/private/fin/cost-of-hosting             00d8e499 cost-of-hosting article online
remotes/private/fin/kubecon                     f4fd7df2 remove published or spun off articles
remotes/private/fin/kubecon-overview            21fae984 publish kubecon overview article
remotes/private/fin/kubecon2018                 1edc5ec8 add series
remotes/private/fin/netconf                     3f4b7ece publish the netconf articles
remotes/private/fin/netdev                      6ee66559 publish articles from netdev 2.2
remotes/private/fin/pgp-offline                 f841deed pgp offline branch ready for publication
remotes/private/fin/primes                      c7e5b912 publish the ROCA paper
remotes/private/fin/runtimes                    4bee1d70 prepare publication of runtimes articles
remotes/private/fin/token-benchmarks            5a363992 regenerate timestamp automatically
remotes/private/ideas/astropy                   95d53152 astropy or python in astronomy
remotes/private/ideas/avaneya                   20a6d149 crowdfunded blade-runner-themed GPLv3 simcity-like simulator
remotes/private/ideas/backups-benchmarks        fe2f1f13 review of backup software through performance and features
remotes/private/ideas/cumin                     7bed3945 review of the cumin automation tool from WM foundation
remotes/private/ideas/future-of-distros         d086ca0d modern packaging problems and complex apps
remotes/private/ideas/on-dying                  a92ad23f another dying thing
remotes/private/ideas/openpgp-discovery         8f2782f0 openpgp discovery mechanisms (WKD, etc), thanks to jonas meurer
remotes/private/ideas/password-bench            451602c0 bruteforce estimates for various password patterns compared with RSA key sizes
remotes/private/ideas/prometheus-openmetrics    2568dbd6 openmetrics standardizing prom metrics enpoints
remotes/private/ideas/telling-time              f3c24a53 another way of telling time
remotes/private/ideas/wallabako                 4f44c5da talk about wallabako, read-it-later + kobo hacking
remotes/private/stalled/bench-bench-bench       8cef0504 benchmarking http benchmarking tools
remotes/private/stalled/debian-survey-democracy 909bdc98 free software surveys and debian democracy, volunteer vs paid work
Wow, what a mess! Let's see if I can make sense of this:

Attic Those are articles that I thought about, then finally rejected, either because it didn't seem worth it, or my editors rejected it, or I just moved on:
  • novena: the project is ooold now, didn't seem to fit a LWN article. it was basically "how can i build my novena now" and "you guys rock!" it seems like the MNT Reform is the brain child of the Novena now, and I dare say it's even cooler!
  • secureboot: my LWN editors were critical of my approach, and probably rightly so - it's a really complex subject and I was probably out of my depth... it's also out of date now, we did manage secureboot in Debian
  • wireguard: LWN ended up writing extensive coverage, and I was biased against Donenfeld because of conflicts in a previous project

Backlog Those were articles I was planning to write about next.
  • dat: I already had written Sharing and archiving data sets with Dat, but it seems I had more to say... mostly performance issues, beaker, no streaming, limited adoption... to be investigated, I guess?
  • packet: a primer on data communications over ham radio, and the cool new tech that has emerged in the free software world. those are mainly notes about Pat, Direwolf, APRS and so on... just never got around to making sense of it or really using the tech...
  • performance-tweaks: "optimizing websites at the age of http2", the unwritten story of the optimization of this website with HTTP/2 and friends
  • serverless: god. one of the leftover topics at Kubecon, my notes on this were thin, and the actual subject, possibly even thinner... the only lie worse than the cloud is that there's no server at all! concretely, that's a pile of notes about Kubecon which I wanted to sort through. Probably belongs in the attic now.

Fin Those are finished articles, they were published on my website and LWN, but the branches were kept because previous drafts had private notes that should not be published.

Ideas A lot of those branches were actually just an empty commit, with the commitlog being the "pitch", more or less. I'd send that list to my editors, sometimes with a few more links (basically the above), and they would nudge me one way or the other. Sometimes they would actively discourage me to write about something, and I would do it anyways, send them a draft, and they would patiently make me rewrite it until it was a decent article. This was especially hard with the terminal emulator series, which took forever to write and even got my editors upset when they realized I had never installed Fedora (I ended up installing it, and I was proven wrong!)

Stalled Oh, and then there's those: those are either "ideas" or "backlog" that got so far behind that I just moved them out of the way because I was tired of seeing them in my list.
  • stalled/bench-bench-bench benchmarking http benchmarking tools, a horrible mess of links, copy-paste from terminals, and ideas about benchmarking... some of this trickled out into this benchmarking guide at Tor, but not much more than the list of tools
  • stalled/debian-survey-democracy: "free software surveys and Debian democracy, volunteer vs paid work"... A long standing concern of mine is that all Debian work is supposed to be volunteer, and paying explicitly for work inside Debian has traditionally been frowned upon, even leading to serious drama and dissent (remember Dunc-Tank)? back when I was writing for LWN, I was also doing paid work for Debian LTS. I also learned that a lot (most?) Debian Developers were actually being paid by their job to work on Debian. So I was confused by this apparent contradiction, especially given how the LTS project has been mostly accepted, while Dunc-Tank was not... See also this talk at Debconf 16. I had hopes that this study would show the "hunch" people have offered (that most DDs are paid to work on Debian) but it seems to show the reverse (only 36% of DDs, and 18% of all respondents paid). So I am still confused and worried about the sustainability of Debian.

What do you think? So that's all I got. As people might have noticed here, I have much less time to write these days, but if there's any subject in there I should pick, what is the one that you would find most interesting? Oh! and I should mention that you can write to LWN! If you think people should know more about some Linux thing, you can get paid to write for it! Pitch it to the editors, they won't bite. The worst that can happen is that they say "yes" and there goes two years of your life learning to write. Because no, you don't know how to write, no one does. You need an editor to write. That's why this article looks like crap and has a smiley. :)

24 April 2021

Antoine Beaupr : A dead game clock

Time flies. Back in 2008, I wrote a game clock. Since then, what was first called "chess clock" was renamed to pychessclock and then Gameclock (2008). It shipped with Debian 6 squeeze (2011), 7 wheezy (4.0, 2013, with a new UI), 8 jessie (5.0, 2015, with a code cleanup, translation, go timers), 9 stretch (2017), and 10 buster (2019), phew! Eight years in Debian over 4 releases, not bad! But alas, Debian 11 bullseye (2021) won't ship with Gameclock because both Python 2 and GTK 2 were removed from Debian. I lack the time, interest, and energy to port this program. Even if I could find the time, everyone is on their phone nowadays. So finding the right toolkit would require some serious thinking about how to make a portable program that can run on Linux and Android. I care less about Mac, iOS, and Windows, but, interestingly, it feels it wouldn't be much harder to cover those as well if I hit both Linux and Android (which is already hard enough, paradoxically). (And before you ask, no, Java is not an option for me thanks. If I switch to anything else than Python, it would be Golang or Rust. And I did look at some toolkit options a few years ago, was excited by none.) So there you have it: that is how software dies, I guess. Alternatives include: PS: Monkeysign also suffered the same fate, for what that's worth. Alternatives include caff, GNOME Keysign, and pius. Note that this does not affect the larger Monkeysphere project, which will ship with Debian bullseye.

23 March 2021

Antoine Beaupr : Major email crash with syncmaildir

TL:DR; lost half my mail (150,000 messages, ~6GB) last night. Cause uncertain, but possibly a combination of a dead CMOS battery, systemd OnCalendar=daily, a (locking?) bug in syncmaildir, and generally, a system too exotic and complicated.

The crash So I somehow lost half my mail:
anarcat@angela:~(main)$ du -sh Maildir/
7,9G    Maildir/
anarcat@curie:~(main)$ du -sh Maildir
14G     Maildir
anarcat@marcos:~$ du -sh Maildir
8,0G    Maildir
Those are three different machines:
  • angela: my laptop, not always on
  • curie: my workstation, mostly always on
  • marcos: my mail server, always on
Those mails are synchronized using a rather exotic system based on SSH, syncmaildir and rsendmail. The anomaly started on curie:
-- Reboot --
mar 22 16:13:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:13:00 curie smd-pull[4801]: rm: impossible de supprimer '/home/anarcat/.smd/workarea/Maildir': Le dossier n'est pas vide
mar 22 16:13:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:13:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:13:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
mar 22 16:14:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:14:00 curie smd-pull[7025]:  4091 ?        00:00:00 smd-push
mar 22 16:14:00 curie smd-pull[7025]: Already running.
mar 22 16:14:00 curie smd-pull[7025]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:14:00 curie smd-pull[7025]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(run(kill 4091) run(rm /home/anarcat/.smd/lock))
mar 22 16:14:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:14:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:14:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
Then it seems like smd-push (from curie) started destroying the universe for some reason:
mar 22 16:20:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:20:00 curie smd-pull[9319]:  4091 ?        00:00:00 smd-push
mar 22 16:20:00 curie smd-pull[9319]: Already running.
mar 22 16:20:00 curie smd-pull[9319]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:20:00 curie smd-pull[9319]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(ru
mar 22 16:20:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:20:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:20:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
mar 22 16:21:34 curie smd-push[4091]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(293920), bytes-received(0), xdelta-received(26995)
mar 22 16:21:35 curie smd-push[9374]: register: smd-client@smd-server-register: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(215)
mar 22 16:21:35 curie systemd[3199]: smd-push.service: Succeeded.
Notice the del-mails(293920) there: it is actively trying to destroy basically every email in my mail spool. Then somehow push and pull started both at once:
mar 22 16:21:35 curie systemd[3199]: Started push emails with syncmaildir.
mar 22 16:21:35 curie systemd[3199]: Starting push emails with syncmaildir...
mar 22 16:22:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:22:00 curie smd-pull[10333]:  9455 ?        00:00:00 smd-push
mar 22 16:22:00 curie smd-pull[10333]: Already running.
mar 22 16:22:00 curie smd-pull[10333]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:22:00 curie smd-pull[10333]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(r
mar 22 16:22:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:22:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:22:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
mar 22 16:22:00 curie smd-push[9455]: smd-client: ERROR: Data transmission failed.
mar 22 16:22:00 curie smd-push[9455]: smd-client: ERROR: This problem is transient, please retry.
mar 22 16:22:00 curie smd-push[9455]: smd-client: ERROR: server sent ABORT or connection died
mar 22 16:22:00 curie smd-push[9455]: smd-server: ERROR: Unable to open Maildir/.kobo/cur/1498563708.M122624P22121.marcos,S=32234,W=32792:2,S: Maildir/.kobo/cur/1498563708.M122624P22121.marco
mar 22 16:22:00 curie smd-push[9455]: smd-server: ERROR: The problem should be transient, please retry.
mar 22 16:22:00 curie smd-push[9455]: smd-server: ERROR: Unable to open requested file.
mar 22 16:22:00 curie smd-push[9455]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(293920), bytes-received(0), xdelta-received(26995)
mar 22 16:22:00 curie smd-push[9455]: default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
mar 22 16:22:00 curie smd-push[9455]: default: smd-server@localhost: TAGS: error::context(transmit) probable-cause(simultaneous-mailbox-edit) human-intervention(avoidable) suggested-actions(r
mar 22 16:22:00 curie systemd[3199]: smd-push.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:22:00 curie systemd[3199]: smd-push.service: Failed with result 'exit-code'.
mar 22 16:22:00 curie systemd[3199]: Failed to start push emails with syncmaildir.
There it seems push tried to destroy the universe again: del-mails(293920). Interestingly, the push started again in parallel with the pull, right that minute:
mar 22 16:22:00 curie systemd[3199]: Starting push emails with syncmaildir...
... but didn't complete for a while, here's pull trying to start again:
mar 22 16:24:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:24:00 curie smd-pull[12051]: 10466 ?        00:00:00 smd-push
mar 22 16:24:00 curie smd-pull[12051]: Already running.
mar 22 16:24:00 curie smd-pull[12051]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:24:00 curie smd-pull[12051]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(run(kill 10466) run(rm /home/anarcat/.smd/lock))
mar 22 16:24:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:24:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:24:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
... and the long push finally resolving:
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: Data transmission failed.
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: This problem is transient, please retry.
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: server sent ABORT or connection died
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: Data transmission failed.
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: This problem is transient, please retry.
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: server sent ABORT or connection died
mar 22 16:24:00 curie smd-push[10466]: smd-server: ERROR: Unable to open Maildir/.kobo/cur/1498563708.M122624P22121.marcos,S=32234,W=32792:2,S: Maildir/.kobo/cur/1498563708.M122624P22121.marcos,S=32234,W=32792:2,S: No such file or directory
mar 22 16:24:00 curie smd-push[10466]: smd-server: ERROR: The problem should be transient, please retry.
mar 22 16:24:00 curie smd-push[10466]: smd-server: ERROR: Unable to open requested file.
mar 22 16:24:00 curie smd-push[10466]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(293920), bytes-received(0), xdelta-received(26995)
mar 22 16:24:00 curie smd-push[10466]: default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
mar 22 16:24:00 curie smd-push[10466]: default: smd-server@localhost: TAGS: error::context(transmit) probable-cause(simultaneous-mailbox-edit) human-intervention(avoidable) suggested-actions(retry)
mar 22 16:24:00 curie systemd[3199]: smd-push.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:24:00 curie systemd[3199]: smd-push.service: Failed with result 'exit-code'.
mar 22 16:24:00 curie systemd[3199]: Failed to start push emails with syncmaildir.
mar 22 16:24:00 curie systemd[3199]: Starting push emails with syncmaildir...
This pattern repeats until 16:35, when that locking issue silently recovered somehow:
mar 22 16:35:03 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:35:41 curie smd-pull[20788]: default: smd-client@localhost: TAGS: stats::new-mails(5), del-mails(1), bytes-received(21885), xdelta-received(6863398)
mar 22 16:35:42 curie smd-pull[21373]: register: smd-client@localhost: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(215)
mar 22 16:35:42 curie systemd[3199]: smd-pull.service: Succeeded.
mar 22 16:35:42 curie systemd[3199]: Started pull emails with syncmaildir.
mar 22 16:36:35 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:36:36 curie smd-pull[21738]: default: smd-client@localhost: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(214)
mar 22 16:36:37 curie smd-pull[21816]: register: smd-client@localhost: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(215)
mar 22 16:36:37 curie systemd[3199]: smd-pull.service: Succeeded.
mar 22 16:36:37 curie systemd[3199]: Started pull emails with syncmaildir.
... notice that huge xdelta-received there, that's 7GB right there. Mysteriously, the curie mail spool survived this, possibly because smd-pull started failing again:
mar 22 16:38:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:38:00 curie smd-pull[23556]: 21887 ?        00:00:00 smd-push
mar 22 16:38:00 curie smd-pull[23556]: Already running.
mar 22 16:38:00 curie smd-pull[23556]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:38:00 curie smd-pull[23556]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(run(kill 21887) run(rm /home/anarcat/.smd/lock))
mar 22 16:38:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:38:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:38:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
That could have been when i got on angela to check my mail, and it was busy doing the nasty removal stuff... although the times don't match. Here is when angela came back online:
anarcat@angela:~(main)$ last
anarcat  :0           :0               Mon Mar 22 19:57   still logged in
reboot   system boot  5.10.0-0.bpo.3-a Mon Mar 22 19:57   still running
anarcat  :0           :0               Mon Mar 22 17:43 - 18:47  (01:03)
reboot   system boot  5.10.0-0.bpo.3-a Mon Mar 22 17:39   still running
Then finally the sync on curie started failing with:
mar 22 16:46:35 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:46:42 curie smd-pull[27455]: smd-server: ERROR: Client aborted, removing /home/anarcat/.smd/curie-anarcat__Maildir.db.txt.new and /home/anarcat/.smd/curie-anarcat__Maildir.db.txt.mtime.new
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: Failed to copy Maildir/.debian/cur/1613401668.M901837P27073.marcos,S=3740,W=3815:2,S to Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: The destination already exists but its content differs.
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: To fix this problem you have two options:
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: - rename Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S by hand so that Maildir/.debian/cur/1613401668.M901837P27073.marcos,S=3740,W=3815:2,S
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR:   can be copied without replacing it.
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR:   Executing  cd; mv -n "Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S" "Maildir/.koumbit/cur/1616446002.1.localhost"  should work.
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: - run smd-push so that your changes to Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR:   are propagated to the other mailbox
mar 22 16:46:42 curie smd-pull[27455]: default: smd-client@localhost: TAGS: error::context(copy-message) probable-cause(concurrent-mailbox-edit) human-intervention(necessary) suggested-actions(run(mv -n "/home/anarcat/.smd/workarea/Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S" "/home/anarcat/.smd/workarea/Maildir/.koumbit/tmp/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S") run(smd-push default))
mar 22 16:46:42 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:46:42 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:46:42 curie systemd[3199]: Failed to start pull emails with syncmaildir.
It went on like this until I found the problem. This is, presumably, a good thing because those emails were not being destroyed. On angela, things looked like this:
-- Reboot --
mar 22 17:39:29 angela systemd[1677]: Started run notmuch new at least once a day.
mar 22 17:39:29 angela systemd[1677]: Started run smd-pull regularly.
mar 22 17:40:46 angela systemd[1677]: Starting pull emails with syncmaildir...
mar 22 17:43:18 angela smd-pull[3916]: smd-server: ERROR: Unable to open Maildir/.tor/new/1616446842.M285912P26118.marcos,S=8860,W=8996: Maildir/.tor/new/1616446842.M285912P26118.marcos,S=886
0,W=8996: No such file or directory
mar 22 17:43:18 angela smd-pull[3916]: smd-server: ERROR: The problem should be transient, please retry.
mar 22 17:43:18 angela smd-pull[3916]: smd-server: ERROR: Unable to open requested file.
mar 22 17:43:18 angela smd-pull[3916]: smd-client: ERROR: Data transmission failed.
mar 22 17:43:18 angela smd-pull[3916]: smd-client: ERROR: This problem is transient, please retry.
mar 22 17:43:18 angela smd-pull[3916]: smd-client: ERROR: server sent ABORT or connection died
mar 22 17:43:18 angela smd-pull[3916]: default: smd-server@smd-server-anarcat: TAGS: error::context(transmit) probable-cause(simultaneous-mailbox-edit) human-intervention(avoidable) suggested
-actions(retry)
mar 22 17:43:18 angela smd-pull[3916]: default: smd-client@localhost: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
mar 22 17:43:18 angela systemd[1677]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 17:43:18 angela systemd[1677]: smd-pull.service: Failed with result 'exit-code'.
mar 22 17:43:18 angela systemd[1677]: Failed to start pull emails with syncmaildir.
mar 22 17:43:18 angela systemd[1677]: Starting pull emails with syncmaildir...
mar 22 17:43:29 angela smd-pull[4847]: default: smd-client@localhost: TAGS: stats::new-mails(29), del-mails(0), bytes-received(401519), xdelta-received(38914)
mar 22 17:43:29 angela smd-pull[5600]: register: smd-client@localhost: TAGS: stats::new-mails(2), del-mails(0), bytes-received(92150), xdelta-received(471)
mar 22 17:43:29 angela systemd[1677]: smd-pull.service: Succeeded.
mar 22 17:43:29 angela systemd[1677]: Started pull emails with syncmaildir.
mar 22 17:43:29 angela systemd[1677]: Starting push emails with syncmaildir...
mar 22 17:43:32 angela smd-push[5693]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(217)
mar 22 17:43:33 angela smd-push[6575]: register: smd-client@smd-server-register: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(219)
mar 22 17:43:33 angela systemd[1677]: smd-push.service: Succeeded.
mar 22 17:43:33 angela systemd[1677]: Started push emails with syncmaildir.
Notice how long it took to get the first error, in that first failure: it failed after 3 minutes! Presumably that's when it started deleting all that mail. And this is during pull, not push, so the error didn't come from angela.

Affected data It seems 2GB of mail from my main INBOX was destroyed. Another 2.4GB of spam (kept for training purposes) was also destroyed, along with 700MB of Sent mail. The rest is hard to figure out, because the folders are actually still there, just smaller. So I relied on ncdu to figure out the size changes. (Note that I don't really archive (or delete much of) my mail since I use notmuch, which is why the INBOX is so large...) Concretely, according to the notmuch-new.service which still runs periodically on marcos, here are the changes that happened on the server:
mar 22 16:17:12 marcos notmuch[10729]: Added 7 new messages to the database. Removed 57985 messages. Detected 1372 file renames.
mar 22 16:22:43 marcos notmuch[12826]: No new mail. Removed 143842 messages. Detected 6072 file renames.
mar 22 16:27:02 marcos notmuch[13969]: No new mail. Removed 82071 messages. Detected 1783 file renames.
mar 22 16:29:45 marcos notmuch[15079]: Added 22743 new messages to the database. Detected 1 file rename.
mar 22 16:31:48 marcos notmuch[16196]: Added 22779 new messages to the database. Removed 5 messages.
mar 22 16:33:11 marcos notmuch[17192]: Added 3711 new messages to the database.
mar 22 16:40:41 marcos notmuch[19122]: Added 74558 new messages to the database. Detected 1 file rename.
mar 22 16:43:21 marcos notmuch[20325]: Added 9061 new messages to the database. Detected 4 file renames.
mar 22 17:43:08 marcos notmuch[7420]: Added 1793 new messages to the database. Detected 6 file renames.
That is basically the entire mail spool destroyed at first (283 898 messages), and then bits and pieces of it progressively re-added (134 645 messages), somehow, so 149 253 mails were lost, presumably.

Recovery I disabled the services all over the place:
systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer
(Well, technically, I did that only on angela, as I thought the problem was there. Luckily, curie kept going but it seems like it was harmless.) I made a backup of the mail spool on curie:
tar cf - Maildir/   pv -s 14G   gzip -c > Maildir.tgz
Then I crossed my fingers and ran smd-push -v -s, as that was suggested by smd error codes themselves. That thankfully started restoring mail. It failed a few times on weird cases of files being duplicates, but I resolved this by following the instructions. Or mostly: I actually deleted the files instead of moving them, which made smd even unhappier (if there ever was such a thing). I had to recreate some of those files, so, lesson learned: do follow the advice smd gives you, even if it seems useless or strange. But then smd-push was humming along, uploading tens of thousands of messages, saturating the upload in the office, refilling the mail spool on the server... yaay!... ? Except... well, of course that didn't quite work: the mail spool in the office eventually started to grow beyond the size of the mail spool on the workstation. That is what smd-push eventually settled on:
default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(151697), del-mails(0), bytes-received(7539147811), xdelta-received(10881198)
It recreated 151 697 emails, adding about 2000 emails to the pool, kind of from nowhere at all. On marcos, before:
ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
--- /home/anarcat/Maildir ------------------------------------
    4,0 GiB [##########] /.notmuch
  717,3 MiB [#         ] /.Archives.2014
  498,2 MiB [#         ] /.feeds.debian-planet
  453,1 MiB [#         ] /.Archives.2012
  414,5 MiB [#         ] /.debian
  408,2 MiB [#         ] /.quoifaire
  389,8 MiB [          ] /.rapports
  356,6 MiB [          ] /.tor
  182,6 MiB [          ] /.koumbit
  179,8 MiB [          ] /tmp
   56,8 MiB [          ] /.nn
   43,0 MiB [          ] /.act-mtl
   32,6 MiB [          ] /.feeds.sysadvent
   31,7 MiB [          ] /.feeds.releases
   31,4 MiB [          ] /.Sent.2005
   26,3 MiB [          ] /.sage
   25,5 MiB [          ] /.freedombox
   24,0 MiB [          ] /.feeds.git-annex
   21,1 MiB [          ] /.Archives.2011
   19,1 MiB [          ] /.Sent.2003
   16,7 MiB [          ] /.bugtraq
   16,2 MiB [          ] /.mlug
 Total disk usage:   8,0 GiB  Apparent size:   7,6 GiB  Items: 184426
After:
ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
--- /home/anarcat/Maildir ------------------------------------
    4,7 GiB [##########] /.notmuch
    2,7 GiB [#####     ] /.junk
    1,9 GiB [###       ] /cur
  717,3 MiB [#         ] /.Archives.2014
  659,3 MiB [#         ] /.Sent
  513,9 MiB [#         ] /.Archives.2012
  498,2 MiB [#         ] /.feeds.debian-planet
  449,6 MiB [          ] /.Archives.2015
  414,5 MiB [          ] /.debian
  408,2 MiB [          ] /.quoifaire
  389,8 MiB [          ] /.rapports
  380,8 MiB [          ] /.Archives.2013
  356,6 MiB [          ] /.tor
  261,1 MiB [          ] /.Archives.2011
  240,9 MiB [          ] /.koumbit
  183,6 MiB [          ] /.Archives.2010
  179,8 MiB [          ] /tmp
  128,4 MiB [          ] /.lists
  106,1 MiB [          ] /.inso-interne
  103,0 MiB [          ] /.github
   75,0 MiB [          ] /.nanog
   69,8 MiB [          ] /.full-disclosure
 Total disk usage:  16,2 GiB  Apparent size:  15,5 GiB  Items: 341143
That is 156 717 files more. On curie:
ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
--- /home/anarcat/Maildir ------------------------------------------------------------------
    2,7 GiB [##########] /.junk
    2,3 GiB [########  ] /.notmuch
    1,9 GiB [######    ] /cur
  661,2 MiB [##        ] /.Archives.2014
  655,3 MiB [##        ] /.Sent
  512,0 MiB [#         ] /.Archives.2012
  447,3 MiB [#         ] /.Archives.2015
  438,5 MiB [#         ] /.feeds.debian-planet
  406,5 MiB [#         ] /.quoifaire
  383,6 MiB [#         ] /.debian
  378,6 MiB [#         ] /.Archives.2013
  303,3 MiB [#         ] /.tor
  296,0 MiB [#         ] /.rapports
  237,6 MiB [          ] /.koumbit
  233,2 MiB [          ] /.Archives.2011
  182,1 MiB [          ] /.Archives.2010
  127,0 MiB [          ] /.lists
  104,8 MiB [          ] /.inso-interne
  102,7 MiB [          ] /.register
   89,6 MiB [          ] /.github
   67,1 MiB [          ] /.full-disclosure
   66,5 MiB [          ] /.nanog
 Total disk usage:  13,3 GiB  Apparent size:  12,6 GiB  Items: 342465
Interestingly, there are more files, but less disk usage. It's possible the notmuch database there is more efficient. So maybe there's nothing to worry about. Last night's marcos backup has:
root@marcos:/home/anarcat# find /mnt/home/anarcat/Maildir   pv -l   wc -l
 341k 0:00:16 [20,4k/s] [                             <=>                                                                                                                                     ]
341040
... 341040 files, which seems about right, considering some mail was delivered during the day. An audit can be performed with hashdeep:
borg mount /media/sdb2/borg/::marcos-auto-2021-03-22 /mnt
hashdeep -c sha256 -r /mnt/home/anarcat/Maildir   pv -l -s 341k > Maildir-backup-manifest.txt
And then compared with:
hashdeep -c sha256 -k Maildir-backup-manifest.txt Maildir/
Some extra files should show up in the Maildir, and very few should actually be missing, because I shouldn't have deleted mail from the previous day the next day, or at least very few. The actual summary hashdeep gave me was:
hashdeep: Audit failed
   Input files examined: 0
  Known files expecting: 0
          Files matched: 339080
Files partially matched: 0
            Files moved: 782
        New files found: 107
  Known files not found: 106
So 106 files added, 107 deleted. Seems good enough for me... Postfix was stopped at Mar 22 21:12:59 to try and stop external events from confusing things even further. I reviewed the delivery log to see if mail that came in during the problem window disappeared:
grep 'dovecot:.*stored mail into mailbox' /var/log/mail.log  
  tail -20  
  sed 's/.*msgid=<//;s/>.*//'   
  while read msgid; do 
    notmuch count --exclude=false id:$msgid  
      grep 0 && echo $msgid missing;
  done
And things looked okay. Now of course if we go further back, we find mail I actually deleted (because I do do that sometimes), so it's hard to use this log as an audit trail. We can only hope that the curie spool is sufficiently coherent to be relied on. Worst case, we'll have to restore from last night's backup, but that's getting far away now: I get hundreds of mails a day in that mail spool, and reseting back to last night does not seem like a good idea. A dry run of smd-pull on angela seems to agree that it's missing some files:
default: smd-client@localhost: TAGS: stats::new-mails(154914), del-mails(0), bytes-received(0), xdelta-received(0)
... a number of mails somewhere in between the other two, go figure. A "wet" run of this was started, without deletion (-n), which gave us:
default: smd-client@localhost: TAGS: stats::new-mails(154911), del-mails(0), bytes-received(7658160107), xdelta-received(10837609)
Strange that it sync'd three less emails, but that's still better than nothing, and we have a mail spool on angela again:
anarcat@angela:~(main)$ notmuch new
purging with prefix '.': spam moved (0), ham moved (0), deleted (0), done
Note: Ignoring non-mail file: /home/anarcat/Maildir//.uidvalidity
Processed 1779 total files in 26s (66 files/sec.).
Added 1190 new messages to the database. Removed 3 messages. Detected 593 file renames.
tagging with prefix '.': spam, sent, feeds, koumbit, tor, lists, rapports, folders, done.
Notice how only 1190 messages were re-added, that is because I killed notmuch before it had time to remove all those mails from its database.

Possible causes I am totally at a loss as to why smd started destroying everything like it did. But a few things come to mind:
  1. I rewired my office on that day.
  2. This meant unplugging curie, the workstation.
  3. It has a bad CMOS battery (known problem), so it jumped around the time continuum a few times, sometimes by years.
  4. The smd services are ran from a systemd unit with OnCalendar=*:0/2. I have heard that it's possible that major time jumps "pile up" execution of jobs, and it seems this happened in this case.
  5. It's possible that locking in smd is not as great as it could be, and that it corrupted its internal data structures on curie, which led it to command a destruction of the remote mail spool.
It's also possible that there was a disk failure on the server, marcos. But since it's running on a (software) RAID-1 array, and no errors have been found (according to dmesg), I don't think that's a plausible hypothesis.

Lessons learned
  1. follow what smd says, even if it seems useless or strange.
  2. trust but verify: just backup everything before you do anything, especially the largest data set.
  3. daily backups are not great for email, unless you're ready to lose a day of email (which I'm not).
  4. hashdeep is great. I keep finding new use cases for it. Last time it was to audit my camera SD card to make sure I didn't forget anything, and now this. it's fast and powerful.
  5. borg is great too. the FUSE mount was especially useful, and it was pretty fast to explore the backup, even through that overhead: checksumming 15GB of mail took about 35 minutes, which gives a respectable 8MB/s, probably bottlenecked by the crap external USB drive I use for backups (!).
  6. I really need to finish my backup system so that I have automated offsite backups, although in this case that would actually have been much slower (certainly not 8MB/s!).

Workarounds and solutions I setup fake-hwclock on curie, so that the next power failure will not upset my clock that badly. I am thinking of switching to ZFS or BTRFS for most of my filesystems, so that I can use filesystem snapshots (including remotely!) as a backup strategy. This seems so much more powerful than crawling the filesystem for changes, and allows for truly offsite backups protected from an attacker (hopefully). But it's a long way there. I'm also thinking of rebuilding my mail setup without smd. It's not the first time something like this happens with smd. It's the first time I am more confident it's the root cause of the problem, however, and it makes me really nervous for the future. I have used offlineimap in the past and it seems it was finally ported to Python 3 so that could be an option again. isync/mbsync is another option, which I tried before but do not remember why I didn't switch. A complete redesign with something like getmail and/or nncp could also be an option. But alas, I lack the time to go crazy with those experiments. Somehow, doing like everyone else and just going with Google still doesn't seem to be an option for me. Screw big tech. But I am afraid they will win, eventually. In any case, I'm just happy I got mail again, strangely.

19 March 2021

Antoine Beaupr : Securing my IRC (irssi, screen) session with dtach and systemd

A recent vulnerability in GNU screen caused some people to reconsider their commitment to the venerable terminal multiplexing program. GNU screen is probably used by thousands of old sysadmins around the world to run long-standing processes and, particularly, IRC sessions, which are especially vulnerable to arbitrary garbage coming on ... screen, so to speak. So this vulnerability matters, and you should definitely pay attention to it. If you haven't switched to tmux yet, now might be a good time to get your fingers trained. But don't switch to it just yet for your IRC session, and read on for a better, more secure solution. After all, it's not because we found this flaw in screen that it doesn't exist in tmux (or your favorite terminal emulator, for that matter, a much scarier thought).

Hardening my bouncer Back in March 2019, I had already switched away from screen for IRC, but not to tmux like many did, but to dtach. I figured that I didn't actually need multiplexing to run my long-running IRC session: I just needed to be able to reattach to the terminal. That's what dtach does. No windows, no panes, and, especially, no way to start a new shell, which is exactly the kind of hardening I need. So I came up with this, to start irssi:
dtach -N /run/$USER/dtach-irssi.socket irssi
To attach:
dtach -a /run/$USER/dtach-irssi.socket
Fairly simple no? Already one attack vector gone: evil attacker can't get a new shell through my terminal multiplexer, yay.

Splitting into another user But why stop there! Why am I running irssi as my main user anyways! Let's take the lessons from good UNIX security, and run this as a separate user altogether. This requires me to create another user, say foo-irc:
adduser foo-irc
... and run it as a systemd service, because how else are you going to start this thing anyways, cron? I came up with something like this unit file:
[Unit]
Description=IRC screen session
After=network.target
[Service]
Type=simple
Environment="TERM=screen.xterm-256color"
User=%i
RuntimeDirectory=%i
ExecStart=-/usr/bin/dtach -N /run/%i/dtach-irssi.socket irssi
ExecStop=-/bin/sh -c 'echo /quit stopping service...   exec /usr/bin/dtach -p /run/%i/dtach-irssi.socket'
ExecReload=-/bin/sh -c 'echo /restart   exec /usr/bin/dtach -p /run/%i/dtach-irssi.socket'
Notice this is a service template, because of the %i stuff. I don't actually remember how to enable this thing, but let's say you drop this in /etc/systemd/system/irssi@.service, then you run:
systemctl daemon-reload
And then, not sure about this bit, instantiate that template:
systemctl enable irssi@foo-irc.service
And then this should start the irssi session:
systemctl start irssi@foo-irc.service
To access the session:
sudo -u foo-irc dtach -a /run/foo-irc/dtach-irssi.socket
Obviously, you will probably need to migrate your irssi configuration over, otherwise you'll end up with a blank, old-school irssi. Take a moment to savor the view though. Nostalgia. Ah.

Hardening irssi But this is still not enough. That pesky foo-irc user can still launch arbitrary commands, thanks to irssi /exec (and a generous Perl scripting environment). Let's throw the entire kitchen sink at it and see what sticks. At this point, the unit file becomes too long to just maintain in a blog post (which would be silly, but not unheard of), so just look at this git repository instead.

Password-less remote irssi The neat thing with this hardening is that I now feel comfortable enough with the setup to just add a password-less SSH key to that (basically throwaway) account: worst that can happen if someone gets a hold of that SSH key is they land in a heavily sandboxed irssi session. So yay, no password to jump on chat. Like a real client or something. Just make sure to secure the SSH key you'll deploy in authorized_keys with:
restrict,pty,command="dtach -a /run/foo-irc/dtach-irssi.socket" [...]
Obviously, make sure the keys are not writable by the user, by placing it somewhere outside their home, which might require hacking at your server's SSH configuration. Because otherwise a compromised user will be able to change his own authorized_keys, which could be bad.

Configuration management And at this point, you may have noticed that you shouldn't actually followed my instructions to the letter. Instead, just use this neat little Puppet module which does all of the above, but also include some little wrapper so that mosh still works. It also includes instructions on how to setup your SSH keys. Enjoy, and let me know if (or rather, how) I messed up.

Updates
  1. it seems that dtach is not very active upstream: the last release (0.9) is from 2016, and the last commit (at the time of writing) is from 2017
  2. dtach is not necessarily safer than screen or tmux from arbitrary input from the outside, in fact there was a vulnerability on dtach CVE-2012-3368 that led to an attacker accessing stack memory (but maybe not code execution)
  3. after writing the Puppet module and publishing this article, I started to get weird behavior from dtach: i would leave the office at night and then return the next morning to find that I was timed out on servers. from my perspective, irssi noticed only when I re-attached the session:
    09:39:51 -!- Irssi: warning Broken pipe
    09:39:51 -!- Irssi: warning SSL write error: Broken pipe
    09:39:51 -!- Irssi: warning SSL write error: Broken pipe
    09:39:51 -!- Irssi: warning SSL write error: Broken pipe
    09:39:51 -!- Irssi: warning SSL write error: Broken pipe
    09:39:51 [bitlbee] -!- Irssi: Connection lost to localhost
    09:39:51 -!- Irssi: warning SSL write error: Broken pipe
    09:39:51 [gitter] -!- Irssi: Connection lost to irc.gitter.im
    09:39:51 [OFTC] -!- Irssi: Connection lost to irc.oftc.net
    09:42:34 [IMC] -!- Irssi: Connection lost to irc.indymedia.org
    09:42:34 -!- Irssi: Connection lost to irc.hackint.org
    09:42:34 -!- Irssi: Connection lost to chat.freenode.net
    09:42:34 -!- dtach_away: Set away
    
    from the outside I actually timed out a few minutes after I detached, which also makes for a weird asymmetry:
    22:31:00 -!- anarcat [~anarcat@ocean] has quit [Ping timeout: 250 seconds]
    
    that is eleven hours before the error I get.
  4. the mosh wrapper script seems to not work as well as it did before. somehow just running mosh $server hangs with a blank screen instead of instantly rejoining the session. I'm not sure it is related to the timeout problem but I did rewrite the wrapper before publication. this is the old version:
    #!/bin/sh
    # inspired by https://serverfault.com/questions/749474/ssh-authorized-keys-command-option-multiple-commands
    command="dtach -a /run/anarcat-irc/dtach-irssi.socket"
    case "$SSH_ORIGINAL_COMMAND" in
        mosh-server*)
        exec mosh-server -- $command
        ;;
        *)
        exec $command
        ;;
    esac
    
    I'm thinking of trying that one out for a while to see if it's related. The weirdest thing is that mosh "un-hangs" if i reattach with plain ssh, so there's definitely something fishy going on here.

19 October 2020

Antoine Beaupr : SSH 2FA with Google Authenticator and Yubikey

About a lifetime ago (5 years), I wrote a tutorial on how to configure my Yubikey for OpenPGP signing, SSH authentication and SSH 2FA. In there, I used the libpam-oath PAM plugin for authentication, but it turns out that had too many problems: users couldn't edit their own 2FA tokens and I had to patch it to avoid forcing 2FA on all users. The latter was merged in the Debian package, but never upstream, and the former was never fixed at all. So I started looking at alternatives and found the Google Authenticator libpam plugin. A priori, it's designed to work with phones and the Google Authenticator app, but there's no reason why it shouldn't work with hardware tokens like the Yubikey. Both use the standard HOTP protocol so it should "just work". After some fiddling, it turns out I was right and you can authenticate with a Yubikey over SSH. Here's that procedure so you don't have to second-guess it yourself.

Installation On Debian, the PAM module is shipped in the google-authenticator source package:
apt install libpam-google-authenticator
Then you need to add the module in your PAM stack somewhere. Since I only use it for SSH, I added this line on top of /etc/pam.d/sshd:
auth required pam_google_authenticator.so nullok
I also used no_increment_hotp debug while debugging to avoid having to renew the token all the time and have more information about failures in the logs. Then reload ssh (not sure that's actually necessary):
service ssh reload

Creating or replacing tokens To create a new key, run this command on the server:
google-authenticator -c
This will prompt you for a bunch of questions. To get them all right, I prefer to just call the right ones on the commandline directly:
google-authenticator --counter-based --qr-mode=NONE --rate-limit=1 --rate-time=30 --emergency-codes=1 --window-size=3
Those are actually the defaults, if my memory serves me right, except for the --qr-mode and --emergency-codes (which can't be disabled so I only print one). I disable the QR code display because I won't be using the codes on my phone, but you would obviously keep it if you want to use the app.

Converting to a Yubikey-compatible secret Unfortunately, the encoding (base32) produced by the google-authenticator command is not compatible with the token expected by the ykpersonalize command used to configure the Yubikey (base16 AKA "hexadecimal", with a fixed 20 bytes length). So you need a way to convert between the two. I wrote a program called oath-convert which basically does this:
read base32
add padding
convert to hex
print
Or, in Python:
def convert_b32_b16(data_b32):
    remainder = len(data_b32) % 8
    if remainder > 0:
        # XXX: assume 6 chars are missing, the actual padding may vary:
        # https://tools.ietf.org/html/rfc3548#section-5
        data_b32 += "======"
    data_b16 = base64.b32decode(data_b32)
    if len(data_b16) < 20:
        # pad to 20 bytes
        data_b16 += b"\x00" * (20 - len(data_b16))
    return binascii.hexlify(data_b16).decode("ascii")
Note that the code assumes a certain token length and will not work correctly for other sizes. To use the program, simply call it with:
head -1 .google_authenticator   oath-convert
Then you paste the output in the prompt:
$ ykpersonalize -1 -o oath-hotp -o append-cr -a
Firmware version 3.4.3 Touch level 1541 Program sequence 2
 HMAC key, 20 bytes (40 characters hex) : [SECRET GOES HERE]
Configuration data to be written to key configuration 1:
fixed: m:
uid: n/a
key: h:[SECRET REDACTED]
acc_code: h:000000000000
OATH IMF: h:0
ticket_flags: APPEND_CR OATH_HOTP
config_flags: 
extended_flags: 
Commit? (y/n) [n]: y
Note that you must NOT pass the -o oath-hotp8 parameter to the ykpersonalize commandline, which we used to do in the Yubikey howto. That is because Google Authenticator tokens are shorter: it's less secure, but it's an acceptable tradeoff considering the plugin is actually maintained. There's actually a feature request to support 8-digit codes so that limitation might eventually be fixed as well. Thanks to the Google Authenticator people and Yubikey people for their support in establishing this procedure.

30 September 2020

Antoine Beaupr : Presentation tools

I keep forgetting how to make presentations. I had a list of tools in a wiki from a previous job, but that's now private and I don't see why I shouldn't share this (even if for myself!). So here it is. What's your favorite presentation tool?

Tips
  • if you have some text to present, outline keywords so that you can present your subject without reading every word
  • ideally, don't read from your slides - they are there to help people follow, not for people to read
  • even better: make your slides pretty with only a few words, or don't make slides at all
Further advice: I'm currently using Pandoc with PDF input (with a trip through LaTeX) for most slides, because PDFs are more reliable and portable than web pages. I've also used Libreoffice, Pinpoint, and S5 (through RST) in the past. I miss Pinpoint, too bad that it died. Some of my presentations are available in my GitLab.com account: See also my list of talks and presentations which I can't seem to keep up to date.

Tools

Beamer (LaTeX)
  • LaTeX class
  • Do not use directly unless you are a LaTeX expert or masochist, see Pandoc below
  • see also powerdot
  • Home page

Darkslide
  • HTML, Javascript
  • presenter notes, table of contents, Markdown, RST, Textile, themes, code samples, auto-reload
  • Home page, demo

Impress.js

Impressive
  • simply displays PDFs or images
  • page transitions, overview screen, highlighting
  • Home page

Libreoffice Impress
  • Powerpoint clone
  • Makes my life miserable
  • PDF export, presenter notes, outline view, etc
  • Home page, screenshots

Magicpoint
  • ancestor of everyone else (1997!)
  • text input format, image support, talk timer, slide guides, HTML/Postscript export, draw on slides, X11 output
  • no release since 2008
  • Home page

mdp and lookatme (commandline)

Pandoc
  • Allows converting from basically whatever into slides, including Beamer, DZSlides, reveal.js, slideous, slidy, Powerpoint
  • PDF, HTML, Powerpoint export, presentation notes, full screen background images
  • nice plain text or markdown input format
  • Home page, documentation

PDF Presenter
  • PDF presentation tool, shows presentation notes
  • basically "Keynote for Linux"
  • Home page, pdf-presenter-console in Debian

Pinpoint
  • Native GNOME app
  • Full screen slides, PDF export, live change, presenter notes, pango markup, video, image backgrounds
  • Home page
  • Abandoned since at least 2019

Reveal.js
  • HTML, Javascript
  • PDF export, Markdown, LaTeX support, syntax-highlighting, nested slides, speaker notes
  • Source code, demo

S5
  • HTML, CSS
  • incremental, bookmarks, keyboard controls
  • can be transformed from ReStructuredText (RST) with rst2s5 with python-docutils
  • Home page, demo

sent
  • X11 only
  • plain text, black on white, image support, and that's it
  • from the suckless.org elitists
  • Home page

Sozi
  • Entire presentation is one poster, zooming and jumping around
  • SVG + Javascript
  • Home page, demo

Other options Another option I have seriously considered is just generate a series of images with good resolution, hopefully matching the resolution (or at least aspect ratio) of the output device. Then you flip through a series of images one by one. In that case, any of those image viewers (not an exhaustive list) would work: Update: it turns out I already wrote a somewhat similar thing when I did a recent presentation. If you're into rants, you might enjoy the README file accompanying the Kubecon rant presentation. TL;DR: "makes me want to scream" and "yet another unsolved problem space, sigh" (refering to "display images full-screen" specifically).

20 September 2020

Russell Coker: Links September 2020

MD5 cracker, find plain text that matches MD5 hash [1]. Debian Quick Image Baker Debian VM images for various architectures [2]. Insightful article on Scientific American about how dental and orthodontic problems are caused by our modern lifestyle [3]. Cory Doctorow wrote an insightful article for Locus Magazine about Intellectual Property [4]. He makes the case that we are losing our rights due to changes in the legal system that benefits corporations. Anarcat has an informative blog post about how to stop people using your Mailman installation to attack others [5]. Since version 1:2.1.27-1 the Debian package of Mailman has created a SUBSCRIBE_FORM_SECRET by default on install, but any installation upgraded from an older version of the Debian package won t have it.

15 July 2020

Antoine Beaupr : Goatcounter analytics in ikiwiki

I have started using Goatcounter for analytics after reading this LWN article called "Lightweight alternatives to Google Analytics". Goatcounter has an interesting approach to privacy in that it:
tracks sessions using a hash of the browser's user agent and IP address to identify the client without storing any personal information. The salt used to generate these hashes is rotated every 4 hours with a sliding window.
There was no Debian package for the project, so I filed a request for package and instead made a fork of the project to add a Docker image. This page documents how Goatcounter was setup from there...

Server configuration
  1. build the image from this fork
    docker build -t zgoat/goatcounter .
    
  2. create volume for db:
    docker volume create goatcounter
    
  3. start the server:
    exec docker run --restart=unless-stopped --volume="goatcounter:/home/user/db/" --publish 127.0.0.1:8081:8080 --detach zgoat/goatcounter serve -listen :8080 -tls none
    
  4. apache configuration:
    <VirtualHost *:80>
                ServerName analytics.anarc.at
                Redirect / https://analytics.anarc.at/
                DocumentRoot /var/www/html/
        </VirtualHost>
    <VirtualHost *:443>
            ServerName analytics.anarc.at
            Use common-letsencrypt-ssl analytics.anarc.at
            DocumentRoot /var/www/html/
            ProxyPass /.well-known/ !
            ProxyPass / http://localhost:8081/
            ProxyPassReverse / http://localhost:8081/
            ProxyPreserveHost on
    </VirtualHost>
    
  5. add analytics.anarc.at to DNS
  6. create a TLS cert with LE:
    certbot certonly --webroot  -d analytics.anarc.at --webroot-path /var/www/html/
    
    note that goatcounter has code to do this on its own, but we avoid it to follow our existing policies and simplify things
  7. create site:
    docker run -it --rm --volume="goatcounter:/home/user/db/" zgoat/goatcounter create -domain analytics.anarc.at -email anarcat+rapports@anarc.at
    
  8. add to ikiwiki template
  9. rebuild wiki:
    ikiwiki --setup ikiwiki.setup --rebuild --verbose
    

Remaining issues
  • Docker image should be FROM scratch, this is statically built golang stuff after all...
  • move to Docker Compose or podman instead of just starting the thing by hand
  • this is all super janky and should be put in config management somehow
  • remove "anarc.at" test site (the site is the analytics site, not the tracked site), seems like this is not possible yet
  • do log parsing instead of Javascript or 1x1 images?
  • compare with goaccess logs, probably at the end of july, to have two full weeks to compare

Fixed issues
  • cache headers are wrong (120ms!) deployed workaround in apache, reported as a bug upstream
  • remove self-referer done, just a matter of configuring the URL in the settings. could this be automated too?
  • add pixel tracking for noscript users done, but required a patch to ikiwi (and I noticed another bug while doing it)
  • goatcounter monitor doesn't with sqlite (fixed upstream!)
  • the :8080 port leaks in some places, namely in the "Site config" documentation that is because i was using -port 8080 which was not necessary.

28 April 2020

Antoine Beaupr : Drowned my camera: dealing with liquid spills in electronics

Folks who acutely dig into this website might know that I have been taking more pictures recently, as I got a new camera since January 2018: a beautiful Fujifilm X-T2 that I really like. Recently, I went out on a photo shoot in the rain. It was intermittent, light rain when I left so I figured the "weather proofing" (dpreview.com calls this "environmentally sealed") would keep the camera secure. After an hour of walking outside, however, rain intensified and I was just quickly becoming more and more soaked. Still trusting the camera would function, I carried on. But after about 90 minutes of dutiful work, the camera just turned off and wouldn't power back on. It had drowned. I couldn't believe it; "but this is supposed to be waterproof! This can't be happening!", I thought. I tried swapping out the battery for a fresh one, which was probably a bad idea (even if I was smart enough to do this under cover): still no luck, yet I could still not believe it was dead, so I figured I would look at it later when I was home. I still eventually removed the battery after a while, remembering that it mattered. Turns out the camera was really dead. Even at home, it wouldn't power up, even with fresh batteries. After closer inspection, the camera was as soaked as I was...
Two Sandisk memory cards with water droplets on them ...even the SD cards were wet!
I was filled with despair! My precious camera! I had been waiting for litterally decades to find the right digital camera that was as close to the good old film cameras I was used to. I was even working on black and white "film" to get back to basics, which turned into a project to witness the impact of the coronavirus on city life! All that was lost, or at least stopped: amazingly, the SD cards were just absolutely fine and survived the flooding without problem.
A one-way sign broken, fallen on the side in a gray cityscape The last photo my camera took before it died
A good photographer friend told me that this was actually fairly common: "if you shoot outside, get used to this, it will happen". So I tried "the rice trick": plunge your camera in a pile of rice and let it rest there for a long time. It didn't work so well: I didn't have a big enough container to hold the camera and the rice. I was also worried about rice particles inserting themselves into the camera holes, as I had opened all the ports to let it dry. I could also not keep myself from inserting a battery and trying it out again: amazingly, it powered up, only once, and died again. After shopping in desperation for dessicators (who would have thought you should keep those little bags from the stuff you order online!), I ended up buying silica gel dehumidifier from Lee Valley (13$, the small one!) which comes in a neat little metal box. But that never arrived in time so I had to find another solution. My partner threw the idea out in jest, but the actual solution worked, and it was surprisingly simple!
My camera and lens drying in a food dehydrator, at 30 C with 22 hours left Tada! Turns out you can dehydrate hardware too!
We have a food dehydrator at home (a Sedna Express if you really want to know) since we do a lot of backpacking and canot-camping, but I never thought I would put electronics in there. Turns out a food dehydrator is perfect: it has a per degree temperature control that can go very low and a timer. I set it to 30 C for 24 hours. (I originally set it to 40 C but it smelled like plastic after a while so my partner turned it off thinking it was melting the camera.) And now the camera is back! I was so happy! There is probably some permanent damage to the delicate circuitry in the camera. And I will probably not go back out in heavy rain again with the camera, or at least not without a rainjacket (35$USD at B&H) on the camera. And I am now in a position to tell other people what to do if they suffer the same fate...

Tips for dealing with electronic liquid damage So, lessons learned...
  1. when you have a liquid spill over your electronics: IMMEDIATELY REMOVE ALL ELECTRIC POWER, including the battery! (this is another reason why all batteries should be removable)
  2. if the spill is "sticky" (e.g. coffee, beer, maple syrup, etc) or "salty", do try to wash it with water, yet without flooding it any further (delicate balance, I know) some devices are especially well adapted to this: I have washed a keyboard with a shower head and drowned the thing completely, it worked fine after drying.
  3. do NOT power it back on until you are certain the equipment is dry
  4. let the electronics device dry for 24 to 48 hours with all ports open in a humidity-absorbing environment: a bag of rice works, but a food dehydrator is best. make sure the rice doesn't get stuck inside the machine: use a small mesh bag if necessary
  5. once you are confident the device has dried, fiddle with the controls and see if water comes out: it might not have dried because it was stuck inside a button or dial. if dry, try powering it back on and watch the symptoms. if it's still weird, try drying it for another day.
  6. if you get tired of waiting and the machine doesn't come back up, you will have to send it to the repair shop or open it up yourself to see if there is soldering damage you can fix.
I hope it might help careless people who dropped their coffee or ran out in the rain, believing the hype of waterproof cameras. Amateur tip: waterproof cameras are not waterproof...

15 April 2020

Antoine Beaupr : OpenDKIM configuration to send debian.org email

Debian.org added support for DKIM in 2020. To configure this on my side, I had to do the following, on top of my email configuration.
  1. add this line to /etc/opendkim/signing.table:
    *@debian.org marcos-debian.anarcat.user
    
  2. add this line to /etc/opendkim/key.table:
    marcos-debian.anarcat.user debian.org:marcos-debian.anarcat.user:/etc/opendkim/keys/marcos-debian.anarcat.user.private
    
    Yes, that's quite a mouthful! That magic selector is long in that way because it needs a special syntax (specifically the .anarcat.user suffix) for Debian to be happy. The -debian string is to tell me where the key is published. The marcos prefix is to remind me where the private is used.
  3. generate the key with:
    opendkim-genkey --directory=/etc/opendkim/keys/ --selector=marcos-debian.anarcat.user --domain=debian.org --verbose
    
    This creates the DNS record in /etc/opendkim/keys/marcos-debian.anarcat.user.txt (alongside the private key in .key).
  4. restart OpenDKIM:
    service opendkim restart
    
    The DNS record will look something like this:
    marcos-debian.anarcat.user._domainkey   IN  TXT ( "v=DKIM1; h=sha256; k=rsa; "
    "p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAtKzBK2f8vg5yV307WAOatOhypQt3ANQ95iDaewkVehmx42lZ6b4PzA1k5DkIarxjkk+7m6oSpx5H3egrUSLMirUiMGsIb5XVGBPFmKZhDVmC7F5G1SV7SRqqKZYrXTufRRSne1eEtA31xpMP0B32f6v6lkoIZwS07yQ7DDbwA9MHfyb6MkgAvDwNJ45H4cOcdlCt0AnTSVndcl"
    "pci5/2o/oKD05J9hxFTtlEblrhDXWRQR7pmthN8qg4WaNI4WszbB3Or4eBCxhUdvAt2NF9c9eYLQGf0jfRsbOcjSfeus0e2fpsKW7JMvFzX8+O5pWfSpRpdPatOt80yy0eqpm1uQIDAQAB" )  ; ----- DKIM key marcos-debian.anarcat.user for debian.org
    
  5. The "p=MIIB..." string needs to be joined together, without the quotes and the p=, and sent in a signed email to changes@db.debian.org:
    -----BEGIN PGP SIGNED MESSAGE-----
    dkimPubKey: marcos.anarcat.user MIIB[...]
    -----BEGIN PGP SIGNATURE-----
    [...]
    
  6. Wait a few minutes for DNS to propagate. You can check if they have with:
    host -t TXT marcos-debian.anarcat.user._domainkey.debian.org nsp.dnsnode.net
    
    (nsp.dnsnode.net being one of the NS records of the debian.org zone.)
If all goes well, the tests should pass when sending from your server as anarcat@debian.org.

Testing Test messages can be sent to dkimvalidator, mail-tester.com or check-auth@verifier.port25.com. Those tools will run Spamassassin on the received emails and report the results. What you are looking for is:
  • -0.1 DKIM_VALID: Message has at least one valid DKIM or DK signature
  • -0.1 DKIM_VALID_AU: Message has a valid DKIM or DK signature from author's domain
  • -0.1 DKIM_VALID_EF: Message has a valid DKIM or DK signature from envelope-from domain
If one of those is missing, then you are doing something wrong and your "spamminess" score will be worse. The latter is especially tricky as it validates the "Envelope From", which is the MAIL FROM: header as sent by the originating MTA, which you see as from=<> in the postfix lost. The following will happen anyways, as soon as you have a signature, that's normal:
  • 0.1 DKIM_SIGNED: Message has a DKIM or DK signature, not necessarily valid
And this might happen if you have a ADSP record but do not correctly sign the message with a domain field that matches the record:
  • 1.1 DKIM_ADSP_ALL No valid author signature, domain signs all mail
That's bad and will affect your spam core badly. I fixed that issue by using a wildcard key in the key table:
--- a/opendkim/key.table
+++ b/opendkim/key.table
@@ -1 +1 @@
-marcos anarc.at:marcos:/etc/opendkim/keys/marcos.private
+marcos %:marcos:/etc/opendkim/keys/marcos.private

References This is a copy of a subset of my more complete email configuration.

18 March 2020

Antoine Beaupr : How can I trust this git repository?

Join me in the rabbit hole of git repository verification, and how we could improve it.

Problem statement As part of my work on automating install procedures at Tor, I ended up doing things like:
git clone REPO
./REPO/bootstrap.sh
... something eerily similar to the infamous curl pipe bash method which I often decry. As a short-term workaround, I relied on the SHA-1 checksum of the repository to make sure I have the right code, by running this both on a "trusted" (ie. "local") repository and the remote, then visually comparing the output:
$ git show-ref master
9f9a9d70dd1f1e84dec69a12ebc536c1f05aed1c refs/heads/master
One problem with this approach is that SHA-1 is now considered as flawed as MD5 so it can't be used as an authentication mechanism anymore. It's also fundamentally difficult to compare hashes for humans. The other flaw with comparing local and remote checksums is that we assume we trust the local repository. But how can I trust that repository? I can either:
  1. audit all the code present and all the changes done to it after
  2. or trust someone else to do so
The first option here is not practical in most cases. In this specific use case, I have audited the source code -- I'm the author, even -- what I need is to transfer that code over to another server. (Note that I am replacing those procedures with Fabric, which makes this use case moot for now as the trust path narrows to "trust the SSH server" which I already had anyways. But it's still important for my fellow Tor developers who worry about trusting the git server, especially now that we're moving to GitLab.) But anyways, in most cases, I do need to trust some other fellow developer I collaborate with. To do this, I would need to trust the entire chain between me and them:
  1. the git client
  2. the operating system
  3. the hardware
  4. the network (HTTPS and the CA cartel, specifically)
  5. then the hosting provider (and that hardware/software stack)
  6. and then backwards all the way back to that other person's computer
I want to shorten that chain as much as possible, make it "peer to peer", so to speak. Concretely, it would eliminate the hosting provider and the network, as attackers.

OpenPGP verification My first reaction is (perhaps perversely) to "use OpenPGP" for this. I figured that if I sign every commit, then I can just check the latest commit and see if the signature is good. The first problem here is that this is surprisingly hard. Let's pick some arbitrary commit I did recently:
commit b3c538898b0ed4e31da27fc9ca22cb55e1de0000
Author: Antoine Beaupr  <anarcat@debian.org>
Date:   Mon Mar 16 14:37:28 2020 -0400
    fix test autoloading
    pytest only looks for file names matching  test  by default. We inline
    tests inside the source code directly, so hijack that.
diff --git a/fabric_tpa/pytest.ini b/fabric_tpa/pytest.ini
new file mode 100644
index 0000000..71004ea
--- /dev/null
+++ b/fabric_tpa/pytest.ini
@@ -0,0 +1,3 @@
+[pytest]
+# we inline tests directly in the source code
+python_files = *.py
That's the output of git log -p in my local repository. I signed that commit, yet git log is not telling me anything special. To check the signature, I need something special: --show-signature, which looks like this:
commit b3c538898b0ed4e31da27fc9ca22cb55e1de0000
gpg: Signature faite le lun 16 mar 2020 14:37:53 EDT
gpg:                avec la clef RSA 7B164204D096723B019635AB3EA1DDDDB261D97B
gpg: Bonne signature de  Antoine Beaupr  <anarcat@orangeseeds.org>  [ultime]
gpg:                 alias  Antoine Beaupr  <anarcat@torproject.org>  [ultime]
gpg:                 alias  Antoine Beaupr  <anarcat@anarc.at>  [ultime]
gpg:                 alias  Antoine Beaupr  <anarcat@koumbit.org>  [ultime]
gpg:                 alias  Antoine Beaupr  <anarcat@debian.org>  [ultime]
Author: Antoine Beaupr  <anarcat@debian.org>
Date:   Mon Mar 16 14:37:28 2020 -0400
    fix test autoloading
    pytest only looks for file names matching  test  by default. We inline
    tests inside the source code directly, so hijack that.
Can you tell if this is a valid signature? If you speak a little french, maybe you can! But even if you would, you are unlikely to see that output on your own computer. What you would see instead is:
commit b3c538898b0ed4e31da27fc9ca22cb55e1de0000
gpg: Signature made Mon Mar 16 14:37:53 2020 EDT
gpg:                using RSA key 7B164204D096723B019635AB3EA1DDDDB261D97B
gpg: Can't check signature: No public key
Author: Antoine Beaupr  <anarcat@debian.org>
Date:   Mon Mar 16 14:37:28 2020 -0400
    fix test autoloading
    pytest only looks for file names matching  test  by default. We inline
    tests inside the source code directly, so hijack that.
Important part: Can't check signature: No public key. No public key. Because of course you would see that. Why would you have my key lying around, unless you're me. Or, to put it another way, why would that server I'm installing from scratch have a copy of my OpenPGP certificate? Because I'm a Debian developer, my key is actually part of the 800 keys in the debian-keyring package, signed by the APT repositories. So I have a trust path. But that won't work for someone who is not a Debian developer. It will also stop working when my key expires in that repository, as it already has on Debian buster (current stable). So I can't assume I have a trust path there either. One could work with a trusted keyring like we do in the Tor and Debian project, and only work inside that project, that said. But I still feel uncomfortable with those commands. Both git log and git show will happily succeed (return code 0 in the shell) even though the signature verification failed on the commits. Same with git pull and git merge, which will happily push your branch ahead even if the remote has unsigned or badly signed commits. To actually verify commits (or tags), you need the git verify-commit (or git verify-tag) command, which seems to do the right thing:
$ LANG=C.UTF-8 git verify-commit b3c538898b0ed4e31da27fc9ca22cb55e1de0000
gpg: Signature made Mon Mar 16 14:37:53 2020 EDT
gpg:                using RSA key 7B164204D096723B019635AB3EA1DDDDB261D97B
gpg: Can't check signature: No public key
[1]$
At least it fails with some error code (1, above). But it's not flexible: I can't use it to verify that a "trusted" developer (say one that is in a trusted keyring) signed a given commit. Also, it is not clear what a failure means. Is a signature by an expired certificate okay? What if the key is signed by some random key in my personal keyring? Why should that be trusted?

Worrying about git and GnuPG In general, I'm worried about git's implementation of OpenPGP signatures. There has been numerous cases of interoperability problems with GnuPG specifically that led to security, like EFAIL or SigSpoof. It would be surprising if such a vulnerability did not exist in git. Even if git did everything "just right" (which I have myself found impossible to do when writing code that talks with GnuPG), what does it actually verify? The commit's SHA-1 checksum? The tree's checksum? The entire archive as a zip file? I would bet it signs the commit's SHA-1 sum, but I just don't know, on the top of my head, and neither do git-commit or git-verify-commit say exactly what is happening. I had an interesting conversation with a fellow Debian developer (dkg) about this and we had to admit those limitations:
<anarcat> i'd like to integrate pgp signing into tor's coding practices more, but so far, my approach has been "sign commits" and the verify step was "TBD" <dkg> that's the main reason i've been reluctant to sign git commits. i haven't heard anyone offer a better subsequent step. if torproject could outline something useful, then i'd be less averse to the practice. i'm also pretty sad that git remains stuck on sha1, esp. given the recent demonstrations. all the fancy strong signatures you can make in git won't matter if the underlying git repo gets changed out from under the signature due to sha1's weakness
In other words, even if git implements the arcane GnuPG dialect just so, and would allow us to setup the trust chain just right, and would give us meaningful and workable error messages, it still would fail because it's still stuck in SHA-1. There is work underway to fix that, but in February 2020, Jonathan Corbet described that work as being in a "relatively unstable state", which is hardly something I would like to trust to verify code. Also, when you clone a fresh new repository, you might get an entirely different repository, with a different root and set of commits. The concept of "validity" of a commit, in itself, is hard to establish in this case, because an hostile server could put you backwards in time, on a different branch, or even on an entirely different repository. Git will warn you about a different repository root with warning: no common commits but that's easy to miss. And complete branch switches, rebases and resets from upstream are hardly more noticeable: only a tiny plus sign (+) instead of a star (*) will tell you that a reset happened, along with a warning (forced update) on the same line. Miss those and your git history can be compromised.

Possible ways forward I don't consider the current implementation of OpenPGP signatures in git to be sufficient. Maybe, eventually, it will mature away from SHA-1 and the interface will be more reasonable, but I don't see that happening in the short term. So what do we do?

git evtag The git-evtag extension is a replacement for git tag -s. It's not designed to sign commits (it only verifies tags) but at least it uses a stronger algorithm (SHA-512) to checksum the tree, and will include everything in that tree, including blobs. If that sounds expensive to you, don't worry too much: it takes about 5 seconds to tag the Linux kernel, according to the author. Unfortunately, that checksum is then signed with GnuPG, in a manner similar to git itself, in that it exposes GnuPG output (which can be confusing) and is likely similarly vulnerable to mis-implementation of the GnuPG dialect as git itself. It also does not allow you to specify a keyring to verify against, so you need to trust GnuPG to make sense of the garbage that lives in your personal keyring (and, trust me, it doesn't). And besides, git-evtag is fundamentally the same as signed git tags: checksum everything and sign with GnuPG. The difference is it uses SHA-512 instead of SHA-1, but that's something git will eventually fix itself anyways.

kernel patch attestations The kernel also faces this problem. Linus Torvalds signs the releases with GnuPG, but patches fly all over mailing list without any form of verification apart from clear-text email. So Konstantin Ryabitsev has proposed a new protocol to sign git patches which uses SHA256 to checksum the patch metadata, commit message and the patch itself, and then sign that with GnuPG. It's unclear to me what this solves, if anything, at all. As dkg argues, it would seem better to add OpenPGP support to git-send-email and teach git tools to recognize that (e.g. git-am) at least if you're going to keep using OpenPGP anyways. And furthermore, it doesn't resolve the problems associated with verifying a full archive either, as it only attests "patches".

jcat Unhappy with the current state of affairs, the author of fwupd (Richard Hughes) wrote his own protocol as well, called jcat, which provides signed "catalog files" similar to the ones provided in Microsoft windows. It consists of a "gzip-compressed JSON catalog files, which can be used to store GPG, PKCS-7 and SHA-256 checksums for each file". So yes, it is yet again another wrapper to GnuPG, probably with all the flaws detailed above, on top of being a niche implementation, disconnected from git.

The Update Framework One more thing dkg correctly identified is:
<dkg> anarcat: even if you could do exactly what you describe, there are still some interesting wrinkles that i think would be problems for you. the big one: "git repo's latest commits" is a loophole big enough to drive a truck through. if your adversary controls that repo, then they get to decide which commits to include in the repo. (since every git repo is a view into the same git repo, just some have more commits than others)
In other words, unless you have a repository that has frequent commits (either because of activity or by a bot generating fake commits), you have to rely on the central server to decide what "the latest version" is. This is the kind of problems that binary package distribution systems like APT and TUF solve correctly. Unfortunately, those don't apply to source code distribution, at least not in git form: TUF only deals with "repositories" and binary packages, and APT only deals with binary packages and source tarballs. That said, there's actually no reason why git could not support the TUF specification. Maybe TUF could be the solution to ensure end-to-end cryptographic integrity of the source code itself. OpenPGP-signed tarballs are nice, and signed git tags can be useful, but from my experience, a lot of OpenPGP (or, more accurately, GnuPG) derived tools are brittle and do not offer clear guarantees, and definitely not to the level that TUF tries to address. This would require changes on the git servers and clients, but I think it would be worth it.

Other Projects

OpenBSD There are other tools trying to do parts of what GnuPG is doing, for example minisign and OpenBSD's signify. But they do not integrate with git at all right now. Although I did find a hack] to use signify with git, it's kind of gross...

Golang Unsurprisingly, this is a problem everyone is trying to solve. Golang is planning on hosting a notary which would leverage a "certificate-transparency-style tamper-proof log" which would be ran by Google (see the spec for details). But that doesn't resolve the "evil server" attack, if we treat Google as an adversary (and we should).

Python Python had OpenPGP going for a while on PyPI, but it's unclear if it ever did anything at all. Now the plan seems to be to use TUF but my hunch is that the complexity of the specification is keeping that from moving ahead.

Docker Docker and the container ecosystem has, in theory, moved to TUF in the form of Notary, "a project that allows anyone to have trust over arbitrary collections of data". In practice however, in my somewhat limited experience, setting up TUF and image verification in Docker is far from trivial.

Android and iOS Even in what is possibly one of the strongest models (at least in terms of user friendliness), mobile phones are surprisingly unclear about those kind of questions. I had to ask if Android had end-to-end authentication and I am still not clear on the answer. I have no idea of what iOS does.

Conclusion One of the core problems with everything here is the common usability aspect of cryptography, and specifically the usability of verification procedures. We have become pretty good at encryption. The harder part (and a requirement for proper encryption) is verification. It seems that problem still remains unsolved, in terms of usability. Even Signal, widely considered to be a success in terms of adoption and usability, doesn't properly solve that problem, as users regularly ignore "The security number has changed" warnings... So, even though they deserve a lot of credit in other areas, it seems unlikely that hardcore C hackers (e.g. git and kernel developers) will be able to resolve that problem without at least a little bit of help. And TUF seems like the state of the art specification around here, it would seem wise to start adopting it in the git community as well. Update: git 2.26 introduced a new gpg.minTrustLevel to "tell various signature verification codepaths the required minimum trust level", presumably to control how Git will treat keys in your keyrings, assuming the "trust database" is valid and up to date. For an interesting narrative of how "normal" (without PGP) git verification can fail, see also A Git Horror Story: Repository Integrity With Signed Commits.

11 March 2020

Antoine Beaupr : Font changes

I have worked a bit on the fonts I use recently. From the main font I use every day in my text editor and terminals to this very website, I did a major and (hopefully) thoughtful overhaul of my typography, in the hope of making things easier to use and, to be honest, just prettier.

Editor and Terminal: Fira mono This all started when I found out about the Jetbrains Mono font. I found the idea of ligatures fascinating: the result is truly beautiful. So I do what I often do (sometimes to the despair of some fellow Debian members) and filed a RFP to document my research. As it turns out, Jetbrains Mono is not free enough to be packaged in Debian, because it requires proprietary tools to build. I nevertheless figured I could try another font so I looked at other monospace alternatives. I found the following packages in debian: Those are also "programmer fonts" that caught my interest but somehow didn't land in Debian yet: Because Fira code had ligatures, i ended up giving it a shot. I really like the originality of the font. See, for example, how the @ sign looks when compared to my previous font, Liberation Mono:
A dark terminal window showing the prompt '[SIGPIPE]anarcat@angela:~(master)$' in the Liberation Mono font Liberation Mono
A dark terminal window showing the prompt '[SIGPIPE]anarcat@angela:~(master)$' in the Fira mono font Fira Mono
Interestingly, a colleague (thanks ahf!) pointed me to the Practical Typography post "Ligatures in programming fonts: hell no", which makes the very convincing argument that ligatures are downright dangerous to programming environment. In my experiences with the fonts, it was also not always giving the result I would expect. I also remembered that the Emacs Haskell mode would have this tendency of inserting crazy syntactic sugar like this in source code without being asked, which I found extremely frustrating. Besides, Emacs doesn't support ligatures, unless you count such horrendous hacks which hack at the display time. That's because Emacs' display layer is not based on a modern rendering library like Pango but some scary legacy code that very few people understand. On top of the complexity of the codebase, there is also resistance in including a modern font. So I ended up using Fira mono everywhere I use fixed-width fonts, even though it's not packaged in Debian. That involves the following configuration in my .Xresources (no, I haven't switched to Wayland):
! font settings
Emacs*font: Fira mono
rofi*font: Fira mono 12
! Symbola is to get emojis to display correctly, apt install fonts-symbola
!URxvt*font: xft:Monospace,xft:Symbola
URxvt*font: xft:Fira mono,xft:Symbola
I also dropped this script in ~/.local/share/fonts/download.sh, for lack of a better way, to download all the fonts I'm interested in:
#!/bin/sh
set -e
set -u
wget --no-clobber --continue https://practicaltypography.com/fonts/charter.zip
unzip -u charter.zip
wget --no-clobber --continue https://github.com/adobe-fonts/source-serif-pro/releases/download/3.001R/source-serif-pro-3.001R.zip
unzip -u source-serif-pro-3.001R.zip
wget --no-clobber --continue https://github.com/adobe-fonts/source-sans-pro/releases/download/3.006R/source-sans-pro-3.006R.zip
unzip -u source-sans-pro-3.006R.zip
wget --no-clobber --continue https://github.com/adobe-fonts/source-code-pro/releases/download/2.030R-ro%2F1.050R-it/source-code-pro-2.030R-ro-1.050R-it.zip
unzip -u source-code-pro-2.030R-ro-1.050R-it.zip
wget --no-clobber --continue -O Fira-4.202.zip https://github.com/mozilla/Fira/archive/4.202.zip   true
unzip -u Fira-4.202.zip
Update: I forgot to mention one motivation behind this was to work around a change in the freetype interpreter, discussed in bug 866685 and my upgrades documentation.

Website: Charter That "hell no" article got me interested in the Practical Typography web book, which I read over the weekend. It was an eye opener and I realized I had already some of those concepts implanted; in fact it's probably after reading the Typography in ten minutes guide that I ended up trying Fira sans a few years ago. I have removed that font since then, however, after realising it was taking up an order of magnitude more bandwidth space than the actual page content. I really loved the book, so much that I actually bought it. I liked the concept of it, the look, and the fact that it's a living document. There's a lot of typography work I would need to do on this site to catchup with the recommendations from Matthew Butterick. Switching fonts is only one part of this, but it's something I was excited to work on. So I sat down and reviewed the free fonts Butterick recommends and tried out a few. I ended up settling on Charter, a relatively old (in terms of computing) font designed by Matthew Carter (of Verdana fame) in 1987. Charter really looks great and is surprisingly small. While a single version of Fira varies between 96KiB (Fira Sans Condensed) and 308KiB (Fira Sans Medium Italic), Charter is only 28KiB! While it's still about as large as most of my articles, I found it was a better compromise and decided to make the jump. This site is now in Serif, which is a huge change for me. The change was done with the following CSS:
h1, h2, h3, h4, h5, body  
    /* 
     * Charter: Butterick's favorite, freely available, found on https://practicaltypography.com/free-fonts.html
     * Palatino: Mac OS
     * Palatino Linotype: Windows
     * Noto serif: Android, packaged in Debian
     * Liberation serif: Linux fallback
     * Serif: fallback
     */
    font-family: Charter, Palatino, "Palatino Linotype", "Noto serif", "Liberation serif", serif;
    /* Charter is available from https://practicaltypography.com/charter.html under the liberal Bitstream license */
 
I have also decided to outline headings by making them slanted, using the beautiful italic version of Charter:
h1, h2, h3, h4, h5  
    font-style: italic;
 
I also made the point size larger, if the display is large enough:
/* enlarge body point size for charter for larger displays */
@media (min-device-width: 750px)  
    body  
        font-size: 20px;
        line-height: 1.3; /* default in FF is ~1.48 */
     
 
Modern display (including fonts) have a much higher resolution and the point size on my website was really getting too small to be readable. This, in turn, required a change to the max-width:
#content  
    /* about 2.5 alphabets with charter */
    max-width: 35em;
 
I have kept the footers and "UI" elements in sans-serif though, and kept those aligned on operating system defaults or "system fonts":
/* some hacking at typefaces to get some fresh zest in here
 * fallbacks from:
 * https://en.wikipedia.org/wiki/List_of_typefaces_included_with_Microsoft_Windows
 * https://en.wikipedia.org/wiki/List_of_typefaces_included_with_macOS
 */
.navbar, .footer  
    /*
     * Avenir: Mac, quite different but should still be pretty
     * Gill sans: Mac, Windows, somewhat similar to Avenir (indulge me)
     * Noto sans: Android, packaged in Debian
     * Open sans: Texlive extras AKA Linux, packaged in Debian
     * Fira sans: Mozilla's Firefox OS
     * Liberation sans: Linux fallback
     * Helvetica: general fallback
     * Sans-serif: fallback
     */
    font-family: Avenir, "Gill sans", "Noto sans", "Open sans", "Fira sans", "Liberation sans", Helvetica, sans-serif;
    /* Fira is available from https://github.com/mozilla/Fira/ under the SIL Open Font License */
    /* alternatively, just use system fonts for "controls" instead:
    font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", "Roboto", "Oxygen", "Ubuntu", "Cantarell", "Fira Sans", "Droid Sans", "Helvetica Neue", sans-serif;
    gitlab uses: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Noto Sans", Ubuntu, Cantarell, "Helvetica Neue", sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji"
    */
 
I've only done some preliminary testing of how this will look like. Although I tested on a few devices (my phone, e-book tablet, an iPad, and of course my laptop), I fully expect things to break on your device. Do let me know if things look better or worse. For future comparison, my site is well indexed in the Internet Wayback Machine and can be used to look at the site before the change. For example, compare the previous article here with its earlier style. The changes to the theme are of course available in my custom ikiwiki bootstrap theme (see in particular commits 0bca0fb7 and d1901fb8), as usual. Enjoy, and let me know what you think! PS: I considered just setting the Charter font in CSS and not adding it as a @font-face. I'm still considering that option and might do so if the performance cost is too big. The Fira mono font is actually set like this for the preformatted sections of the site, but because it's more common (and it's too big) I haven't added it as a font-face. You might want to download the font locally to benefit from the full experience as well. PPS: As it turns out, an earlier version of this post featured exactly that: a non-webfont version of Charter, which works fine if you have a good Charter font available. But it looks absolutely terrible if, like many Linux users, you have the nasty bitmap font shipped with xfonts-100dpi and xfonts-75dpi. So I fixed the webfont and it's unlikely this site will be able to load reasonably well in Linux until those packages are removed or bitmap font rendering is disabled.

2 November 2017

Antoine Beaupr : October 2017 report: LTS, feed2exec beta, pandoc filters, git mediawiki

Debian Long Term Support (LTS) This is my monthly Debian LTS report. This time I worked on the famous KRACK attack, git-annex, golang and the continuous stream of GraphicsMagick security issues.

WPA & KRACK update I spent most of my time this month on the Linux WPA code, to backport it to the old (~2012) wpa_supplicant release. I first published a patchset based on the patches shipped after the embargo for the oldstable/jessie release. After feedback from the list, I also built packages for i386 and ARM. I have also reviewed the WPA protocol to make sure I understood the implications of the changes required to backport the patches. For example, I removed the patches touching the WNM sleep mode code as that was introduced only in the 2.0 release. Chunks of code regarding state tracking were also not backported as they are part of the state tracking code introduced later, in 3ff3323. Finally, I still have concerns about the nonce setup in patch #5. In the last chunk, you'll notice peer->tk is reset, to_set to negotiate a new TK. The other approach I considered was to backport 1380fcbd9f ("TDLS: Do not modify RNonce for an TPK M1 frame with same INonce") but I figured I would play it safe and not introduce further variations. I should note that I share Matthew Green's observations regarding the opacity of the protocol. Normally, network protocols are freely available and security researchers like me can easily review them. In this case, I would have needed to read the opaque 802.11i-2004 pdf which is behind a TOS wall at the IEEE. I ended up reading up on the IEEE_802.11i-2004 Wikipedia article which gives a simpler view of the protocol. But it's a real problem to see such critical protocols developed behind closed doors like this. At Guido's suggestion, I sent the final patch upstream explaining the concerns I had with the patch. I have not, at the time of writing, received any response from upstream about this, unfortunately. I uploaded the fixed packages as DLA 1150-1 on October 31st.

Git-annex The next big chunk on my list was completing the work on git-annex (CVE-2017-12976) that I started in August. It turns out doing the backport was simpler than I expected, even with my rusty experience with Haskell. Type-checking really helps in doing the right thing, especially considering how Joey Hess implemented the fix: by introducing a new type. So I backported the patch from upstream and notified the security team that the jessie and stretch updates would be similarly easy. I shipped the backport to LTS as DLA-1144-1. I also shared the updated packages for jessie (which required a similar backport) and stretch (which didn't) and those Sebastien Delafond published those as DSA 4010-1.

Graphicsmagick Up next was yet another security vulnerability in the Graphicsmagick stack. This involved the usual deep dive into intricate and sometimes just unreasonable C code to try and fit a round tree in a square sinkhole. I'm always unsure about those patches, but the test suite passes, smoke tests show the vulnerability as fixed, and that's pretty much as good as it gets. The announcement (DLA 1154-1) turned out to be a little special because I had previously noticed that the penultimate announcement (DLA 1130-1) was never sent out. So I made a merged announcement to cover both instead of re-sending the original 3 weeks late, which may have been confusing for our users.

Triage & misc We always do a bit of triage even when not on frontdesk duty, so I: I also did smaller bits of work on: The latter reminded me of the concerns I have about the long-term maintainability of the golang ecosystem: because everything is statically linked, an update to a core library (say the SMTP library as in CVE-2017-15042, thankfully not affecting LTS) requires a full rebuild of all packages including the library in all distributions. So what would be a simple update in a shared library system could mean an explosion of work on statically linked infrastructures. This is a lot of work which can definitely be error-prone: as I've seen in other updates, some packages (for example the Ruby interpreter) just bit-rot on their own and eventually fail to build from source. We would also have to investigate all packages to see which one include the library, something which we are not well equipped for at this point. Wheezy was the first release shipping golang packages but at least it's shipping only one... Stretch has shipped with two golang versions (1.7 and 1.8) which will make maintenance ever harder in the long term.
We build our computers the way we build our cities--over time, without a plan, on top of ruins. - Ellen Ullman

Other free software work This month again, I was busy doing some serious yak shaving operations all over the internet, on top of publishing two of my largest LWN articles to date (2017-10-16-strategies-offline-pgp-key-storage and 2017-10-26-comparison-cryptographic-keycards).

feed2exec beta Since I announced this new project last month I have released it as a beta and it entered Debian. I have also wrote useful plugins like the wayback plugin that saves pages on the Wayback machine for eternal archival. The archive plugin can also similarly save pages to the local filesystem. I also added bash completion, expanded unit tests and documentation, fixed default file paths and a bunch of bugs, and refactored the code. Finally, I also started using two external Python libraries instead of rolling my own code: the pyxdg and requests-file libraries, the latter which I packaged in Debian (and fixed a bug in their test suite). The program is working pretty well for me. The only thing I feel is really missing now is a retry/fail mechanism. Right now, it's a little brittle: any network hiccup will yield an error email, which are readable to me but could be confusing to a new user. Strangely enough, I am particularly having trouble with (local!) DNS resolution that I need to look into, but that is probably unrelated with the software itself. Thankfully, the user can disable those with --loglevel=ERROR to silence WARNINGs. Furthermore, some plugins still have some rough edges. For example, The Transmission integration would probably work better as a distinct plugin instead of a simple exec call, because when it adds new torrents, the output is totally cryptic. That plugin could also leverage more feed parameters to save different files in different locations depending on the feed titles, something would be hard to do safely with the exec plugin now. I am keeping a steady flow of releases. I wish there was a way to see how effective I am at reaching out with this project, but unfortunately GitLab doesn't provide usage statistics... And I have received only a few comments on IRC about the project, so maybe I need to reach out more like it says in the fine manual. Always feels strange to have to promote your project like it's some new bubbly soap... Next steps for the project is a final review of the API and release production-ready 1.0.0. I am also thinking of making a small screencast to show the basic capabilities of the software, maybe with asciinema's upcoming audio support?

Pandoc filters As I mentioned earlier, I dove again in Haskell programming when working on the git-annex security update. But I also have a small Haskell program of my own - a Pandoc filter that I use to convert the HTML articles I publish on LWN.net into a Ikiwiki-compatible markdown version. It turns out the script was still missing a bunch of stuff: image sizes, proper table formatting, etc. I also worked hard on automating more bits of the publishing workflow by extracting the time from the article which allowed me to simply extract the full article into an almost final copy just by specifying the article ID. The only thing left is to add tags, and the article is complete. In the process, I learned about new weird Haskell constructs. Take this code, for example:
-- remove needless blockquote wrapper around some tables
--
-- haskell newbie tips:
--
-- @ is the "at-pattern", allows us to define both a name for the
-- construct and inspect the contents as once
--
--   is the "empty record pattern": it basically means "match the
-- arguments but ignore the args"
cleanBlock (BlockQuote t@[Table  ]) = t
Here the idea is to remove <blockquote> elements needlessly wrapping a <table>. I can't specify the Table type on its own, because then I couldn't address the table as a whole, only its parts. I could reconstruct the whole table bits by bits, but it wasn't as clean. The other pattern was how to, at last, address multiple string elements, which was difficult because Pandoc treats spaces specially:
cleanBlock (Plain (Strong (Str "Notifications":Space:Str "for":Space:Str "all":Space:Str "responses":_):_)) = []
The last bit that drove me crazy was the date parsing:
-- the "GAByline" div has a date, use it to generate the ikiwiki dates
--
-- this is distinct from cleanBlock because we do not want to have to
-- deal with time there: it is only here we need it, and we need to
-- pass it in here because we do not want to mess with IO (time is I/O
-- in haskell) all across the function hierarchy
cleanDates :: ZonedTime -> Block -> [Block]
-- this mouthful is just the way the data comes in from
-- LWN/Pandoc. there could be a cleaner way to represent this,
-- possibly with a record, but this is complicated and obscure enough.
cleanDates time (Div (_, [cls], _)
                 [Para [Str month, Space, Str day, Space, Str year], Para _])
    cls == "GAByline" = ikiwikiRawInline (ikiwikiMetaField "date"
                                           (iso8601Format (parseTimeOrError True defaultTimeLocale "%Y-%B-%e,"
                                                           (year ++ "-" ++ month ++ "-" ++ day) :: ZonedTime)))
                        ++ ikiwikiRawInline (ikiwikiMetaField "updated"
                                             (iso8601Format time))
                        ++ [Para []]
-- other elements just pass through
cleanDates time x = [x]
Now that seems just dirty, but it was even worse before. One thing I find difficult in adapting to coding in Haskell is that you need to take the habit of writing smaller functions. The language is really not well adapted to long discourse: it's more about getting small things connected together. Other languages (e.g. Python) discourage this because there's some overhead in calling functions (10 nanoseconds in my tests, but still), whereas functions are a fundamental and important construction in Haskell that are much more heavily optimized. So I constantly need to remind myself to split things up early, otherwise I can't do anything in Haskell. Other languages are more lenient, which does mean my code can be more dirty, but I feel get things done faster then. The oddity of Haskell makes frustrating to work with. It's like doing construction work but you're not allowed to get the floor dirty. When I build stuff, I don't mind things being dirty: I can cleanup afterwards. This is especially critical when you don't actually know how to make things clean in the first place, as Haskell will simply not let you do that at all. And obviously, I fought with Monads, or, more specifically, "I/O" or IO in this case. Turns out that getting the current time is IO in Haskell: indeed, it's not a "pure" function that will always return the same thing. But this means that I would have had to change the signature of all the functions that touched time to include IO. I eventually moved the time initialization up into main so that I had only one IO function and moved that timestamp downwards as simple argument. That way I could keep the rest of the code clean, which seems to be an acceptable pattern. I would of course be happy to get feedback from my Haskell readers (if any) to see how to improve that code. I am always eager to learn.

Git remote MediaWiki Few people know that there is a MediaWiki remote for Git which allow you to mirror a MediaWiki site as a Git repository. As a disaster recovery mechanism, I have been keeping such a historical backup of the Amateur radio wiki for a while now. This originally started as a homegrown Python script to also convert the contents in Markdown. My theory then was to see if we could switch from Mediawiki to Ikiwiki, but it took so long to implement that I never completed the work. When someone had the weird idea of renaming a page to some impossible long name on the wiki, my script broke. I tried to look at fixing it and then remember I also had a mirror running using the Git remote. It turns out it also broke on the same issue and that got me looking in the remote again. I got lost in a zillion issues, including fixing that specific issue, but I especially looked at the possibility of fetching all namespaces because I realized that the remote fetches only a part of the wiki by default. And that drove me to submit namespace support as a patch to the git mailing list. Finally, the discussion came back to how to actually maintain that contrib: in git core or outside? Finally, it looks like I'll be doing some maintenance that project outside of git, as I was granted access to the GitHub organisation...

Galore Yak Shaving Then there's the usual hodgepodge of fixes and random things I did over the month.
There is no [web extension] only XUL! - Inside joke

Next.

Previous.